CodeSOD: Commentary |
"Include descriptive comments for each method," isn't bad advice. I mean, ideally, the method name and parameters would be descriptive enough that you don't need to add lots of comments, but more comments is rarely going to hurt things. Unfortunately, "include descriptive comments" usually decays into "include comments". And that's how we get piles of code like this one, from Patrick:
//
// Function name : CZiArecaRaidController::ReadAllRaidsetInfo
// Machine : w7gre7
// Environment : Visual Studio .Net 2008
// doxygen :
/// \fn CZiArecaRaidController::ReadAllRaidsetInfo(BSTR ContextInfo, IZiArecaDataCollection *pRaidsetInfoCollection, IZiArecaDataCollection *pVolumesetInfoCollection, IZiArecaDataCollection *pPhysicalDriveInfoCollection)
/// \brief
/// \details
/// \param ContextInfo
/// \param *pRaidsetInfoCollection
/// \param *pVolumesetInfoCollection
/// \param *pPhysicalDriveInfoCollection
/// \return STDMETHODIMP
/// \author (redacted)
/// \date 24.01.2011 09:59:10
//
STDMETHODIMP CZiArecaRaidController::ReadAllRaidsetInfo(BSTR ContextInfo, IZiArecaDataCollection **pRaidsetInfoCollection, IZiArecaDataCollection **pVolumesetInfoCollection, IZiArecaDataCollection **pPhysicalDriveInfoCollection)
{
// ...
}
//
// Function name : CZiArecaRaidController::GetArecaErrorMessage
// Description :
// Return type : string
// Argument : ARC_STATUS stat
// Author : (redacted)
// Machine : Lapgre5
// Environment : Visual Studio .Net 2005
// Date/Time : 05.06.2007 15:24:53
//
string CZiArecaRaidController::GetArecaErrorMessage(ARC_STATUS stat)
{
// ...
}
This is the secret sauce of bad documentation: just repeat information already in the code, include information that absolutely doesn't need to be there, and make the whole thing take up more space than the code itself. The only way to make this documentation worse is to make it wrong.
It's the useless information which mystifies me. While knowing what environment was used to build the code is useful, why tag that onto individual methods? Why track which machine made the change? Why do source control by comment when the team was already using Subversion?
There is one thing that the documentation includes, though, that's useful to us. Sometime between 2007 and 2011 they added Doxygen to their toolchain. Perhaps between 2011 and 2022 they've also added meaningful documentation which would make Doxygen useful, but probably not.
Метки: CodeSOD |
Document Soup |
An Enterprise Resource Planning system needs to keep track of your enterprise resources. Many of those resources, especially the expensive ones, need lots of documents tracked about them- inspections, service reports, maintenance bills, etc. So the ERP and the Document Management System need to talk.
Years ago, for Coyne, this presented a few problems. First, the ERP was a mainframe application running on an IBM mainframe. Second, it was getting retired. Third, the DMS didn't talk directly to it, but instead communicated through a terminal emulator and used configuration files to correctly parse the screen contents coming back from the mainframe.
The final, and key problem, was that, when examining the documents stored in the DMS, were that there were big confused piles of unrelated documents getting stored together. Specifically, if you looked for documents for the asset tagged 490, it'd fetch those, and also 49, 4900, 49000, or 490000. And vice versa, searches for 490000 would return all the other documents as well.
Now, within the mainframe, the same data might get displayed on multiple panels. So, for example, the tag for a requisition document might be displayed in RQ01
on one panels, RQ02
in another, and RQ03
in yet another. And each of these fields could display the field differently, "depending on design, whim, programmer laziness and phases of the moon," as Coyne writes.
RQ01: [0000000049]
RQ02: [ 49]
RQ03: [49 ]
Now, the DMS configuration file had a few flags meant to help parse this. For any field, you could set it to strip leading zeros, justify to the left, right or not at all, and whether or not to pad back with zeroes.
The people who designed the module for handling the "fixed assets" documents, the specific documents giving Coyne issues, opted to strip leading zeroes, justify left, and then fill back with zeroes. They deployed this solution, and it had been running for years.
Let's see how it handles common situations:
Key field => Normalized key
[49 ] => [4900000000]
[ 490] => [4900000000]
[4900 ] => [4900000000]
[0000049000] => [4900000000]
Coyne sums it up:
Presto: document soup. Since there were a three-quarters of a million tags in fixed-assets, a whole lot of document soup.
Метки: Feature Articles |
The Tech Lead |
Years ago, Karthika was working as a contractor. She got hired on to develop an intranet application for a small government project. The website in question was already in use, and was absolutely mission critical for the organization, but it also had a very small user base- roughly five users depended on this application.
When Karthika started, she immediately encountered a few surprises. The first was the size of the team- 8 developers, including a Team Lead. That seemed like a large team for that small number of users, and that didn't even include the management overhead. The code base itself was similarly oversized; while the product was important, it was a pretty prosaic CRUD app with a few tricky financial calculations involved.
The core surprise was how awful the application was to use. It was slow to the point of sometimes timing out in the browser, even when running on your local machine. It was buggy, and even when Karthika patched those bugs, there was so much duplicated code in the system that the same bug would linger hidden in other screens for months. "I thought you said you'd fixed this," was a common refrain from the users.
This was long enough ago that the UI was built in ASP.Net WebForms, but new enough that the data access was handled by Entity Framework. And it was one specific feature of WebForms that they were abusing that made everything terrible: UserControls.
UserControls were designed to let developers create reusable widgets. For example a "User Information" screen may group the "User Name" and "Password" fields into a single "Credentials" UserControl, while the address fields would all get grouped together in an "Address" UserControl. That same "Credentials" control could then be dropped into other pages.
When the user interacts with this data, Entity Framework can lookup a User object, hand it off to the UserControls, who allow the user to manipulate it, and then the controls can invoke the save on the User.
The Tech Lead had encountered a problem with this. You see, he didn't want to share the same reference across controls because of "separation of concerns". So instead, each UserControl would create its own User object, populate it with database values, and then let the user interact with it. This meant when each UserControl had its own picture of the user object, and when it was time to save the data on the page, one control could overwrite the changes made by another control.
So the Tech Lead invented CopyOldValues
, a method which, during a save operation, would go out to the database, fetch the current data, and then compare it to the object being saved. Any NULL values in the object being saved would be updated to the database values, and then the object would be saved. This way, a UserControl could have a whole User object, but only populate the fields it was responsible for, leaving the rest as null. So yes, this meant that to save an object to the database, it required two round-trips to the database, per UserControl. And each page could have multiple UserControls.
Karthika saw this, and put together a simple plan to fix this problem: just use the frameworks like they were meant to be used and cut this whole CopyOldValues
nonsense out. She went to the Tech Lead and laid out a plan.
"This isn't an issue," he said. "You're wrong to be worrying about this. Stop wasting my time, and stop wasting yours. Instead, you should look into the date bug."
So, Karthika tracked down the issue related to the date bug. Specifically, the database and the application were supposed to allow certain date fields to be NULL. But, since CopyOldValues
used NULLs to decide which data to save, it was impossible to update a stored value to a NULL. Once again, the fix was obvious: just stop doing this weird intermediate step.
"Wrong," the Team Lead said. "That's totally not the correct way to do it. I have a better plan already."
The "better plan" was to create a custom save method for each UserControl- of which there were hundreds. Each one of these would define an array which used the string names of each field it was responsible for, and then the object and the array would get passed to a new method, FindDifferences
, which would use reflection to inspect the object, copy the updated values to a new object, and prepare to save to the database.
The shocking end result of this, however, is that this made the application even slower. It didn't reduce the number of database round trips, and it added this whole reflection step which made accessing properties even slower. Despite only having five users, and running on a decently powerful machine, it was nigh unusuable. The Team Lead knew what the problem was though: the machine wasn't powerful enough.
Strangely, however, throwing hardware at the problem didn't fix it. So the Team Lead invented his own caching solution, which made things worse. He started reinventing more wheels of Entity Framework and made things worse. He started copy/pasting utility functions into the places they were used to "reduce overhead", which didn't make things worse but made every developer's life demonstrably worse as the repeated code just multiplied.
These problems made the customer angry, and that anger eventually turned into an all hands meeting, with representatives from the client side and the project manager as well. After the venting and complaining was over, the project manager wanted explanations.
"Why," she said, "aren't we able to fix this?"
A round of blamestorming followed, but eventually, Karthika had to get specific and concrete: "We have a set of fixes that could address many of these problems, but the Tech Lead isn't letting us implement them and isn't giving us a good reason why we can't."
The project manager blinked, puzzled. "Who? There's no defined tech lead on this project. You're a team of peers."
"Well," the 'Tech Lead' said, "I… uh… have seniority."
"Seniority?" the project manager asked, no less confused. "You started two weeks earlier, and that was just because you were the one contractor on the bench and we needed someone to knowledge-transfer from the outgoing team."
The Project Manager had been overwhelmed by handling customer complaints, and hadn't been able to carve out time to attend any of the development team meetings. This meant that the Tech Lead's self-appointed promotion went unnoticed for eight months. At this point, the project was too far off the rails for any hope of recovery. The government office "fired" the contracting firm the next week, and all the developers, including Karthika, were fired from the contracting firm the week after that.
Метки: Feature Articles |
CodeSOD: Classic WTF: The Old Ways |
It's a holiday in the US today, so we're taking a trip into the past for a haunting classic about how things used to be. Original. -- Remy
Greg never thought he’d meet a real-life mentat.
“We’re so happy to have you aboard,” said Jordan, the CEO of IniTech. She showed Greg to the back end of the office, to a closed door marked with just one word: Frank. Jordan, not bothering to knock, opening the door.
Greg was overwhelmed with the stench of burned coffee and old-man smell. The office was unadorned and dark, the blinds drawn, illuminated by the blue light coming from an aging CRT screen. He saw a wrinkled scalp behind a tall, black office chair.
“I’m busy,” Frank said.
Jordan cleared her throat. “This is your new programming partner.”
“I’m Greg. It’s nice to meet you–” Greg offered his hand, but a wrinkled appendage slapped it away.
“Get yourself a chair. I know where everything is. You just show me you can type.”
Greg shot Jordan a glance as they left Frank’s office.
“He’s been with us 22 years,” she said. “He knows everything about our code. But his typing’s not what it used to be. Just do what he says. With some luck he’ll be retiring in a few months.”
Greg pulled a spare office chair into Frank’s den. He could see Frank’s face in profile now, resembling the mummy of Rameses II. Frank slid his keyboard to Greg. “Open C:\project.make
in Vim,” Frank said, “and go to line 22.”
Greg thought it was odd that a makefile would right under C:\
, but he did so. He moved the cursor to line 22.
“Increment $VERSION
to 8.3.3
.”
Greg noticed that Frank had his eyes shut, but humored him. In fact, line 22 did declare a $VERSION
constant, and Greg changed it to 8.3.3
.
“You’ll be suitable,” Frank said, crossing his arms. “You’ll do your work from the SMB server. Don’t make any changes without my authorization first.”
Back at his desk, Greg found the SMB server where Frank kept all of his code. Or rather, the SMB mapped all of the files on Frank’s hard drive. Curious, Greg searched for .pas
, .make
, and other source files, wondering why Frank would keep his principle makefile under C:\
.
There were 440 source files, about 200 megabytes, spread out all over the directory strucure. C:\Windows\System32
, C:\Users\Shared\Project
, C:\Program Files\
… Frank’s entire computer was the de facto source repository.
Greg knew if he ever had to make an on-the-fly change to the source, it would take hours just tracking down the right file on SMB. Surely they had a repository he could check changes into. Greg took a deep breath and re-entered Frank’s den.
“Frank, do we have any of this in a repo somewhere? I don’t want to SMB onto your computer every time we make a change. What if we have to patch something overnight?”
“What?!” Frank rose from his office chair, unsteady on his disused legs. “There will be no code changes without my direct supervision! It’s worked just fine for 22 years. Is that understood?”
Greg endured this for several months. Frank would harbor no suggestions of version control or repos. Everything, Frank said, was in his head. As long as no one changed the source without his permission, he would know where everything was.
Despite his frustrations, it greatly impressed Greg. Especially when Frank had memorized loop variables such as these:
for RecursiveWaypointCompressionThreadModuleIndexVerifierPropertyHandleIndex := 1 to 99 do ...
Less amusing was Frank’s insistence on using HEX constants for any encoded string. “You can’t trust any string encoding,” Frank said. It even extended to embedded web pages in their embedded manual:
const
ThirdWebPage : array of byte = [ $2d, $20, ... 660k OF HEX CONSTS..... ];
JQuery33WebPage : array of byte = [ $2d, $20, ... 3,660k OF HEX CONSTS..... ];
But Greg wondered. What would happen if he slipped in just a little change? How long would it take before Frank found out?
One night, he came into the office and logged into Frank’s SMB server. He opened a file and found an innocuous for-loop block. He replaced the twenty-something variable name with i
, saved a backup on his own machine, and went home.
Greg arrived in the office late that morning, stuck in traffic, and was met by Jordan at the door. “Keep this quiet, but Frank just passed away.”
“Was it last night?”
“Brain aneurysm in his sleep.”
Frank probably died before he had a chance to see Greg’s unauthorized change. Greg would never know if Frank actually had the entire codebase memorized. Sometimes Greg would memorize a line or two, or find himself looking up mnemonic tricks to remember long sequences of characters. But it wasn’t like Frank rubbed off on him. Not really.
Метки: CodeSOD |
Error'd: Up Up Down Down Left Right Left... |
...Right B A. Right? Every so often, someone sends us a submission with a hidden agenda. Of course we get the usual solicitations for marriageable exotic beauties and offers to trade linkspam locations. But then there are the really interesting ones. Maybe they're legitimate, maybe they're photoshopped or otherwise faked, and maybe they're an attempt to bypass someone's ban on political propaganda or quack science. In any case, there isn't any of that here this week, but we're saving them up and maybe we'll feature a future issue of spot the fraud for you.
First up is dog lover George with a hysterical spam-blocking email address, sharing a help message that must have been crafted by Catbert himself. "My sixty seconds of glory awaits!" he howls, but then whimpers "I will be real disappointed if the agent isn't [Gone in Sixty Seconds headliner] Nicolas Cage."
Not to single out Insperity, though. Job hunter Quentin G. growls at iCIMS "I suppose since they don't have an email that does make them pretty unavailable." Anybody want to argue that at least "unvailable variable" is a better failure mode than "undefined"? I'm on the fence.
Music fan
Joel
has sent us a rash of submissions. His explanation is
that "While either the image or the text are obviously a reasonable
result
for the search terms, the combination is... interesting."
We suggest
trying a different laundry detergent.
Bug Hunter David B. screenshots this from his iPhone. "I was just checking for a potential version error. I didn't find the one I was expecting."
One of the most famous rivers in Western history has been
famously lost for centuries. How do you lose a river?
Historians recently have declared
it found, but even so, it is scarcely safe home and dry.
Reader
Jeremy Pereira
reckons it's been smuggling messages through the Web.
"A fairly run of the mill error on a Wikipedia page,
but it ends with a heartbreaking plea." Can't somebody do something?
https://thedailywtf.com/articles/up-up-down-down-left-right-left
Метки: Error'd |
CodeSOD: A Pointer to your References |
John C works at a defense contractor, and his peers are well versed in C. Unfortunately, many years ago, a lot of their software started being developed in Java. While references are often described as "pointers, but safer," they are not pointers, so your intuitions about how memory gets allocated and released are likely to be wrong.
Which is definitely the case for John's peers. For example, in C, you generally want really clear understandings of who owns a given block of memory. You don't want to allocate memory and hand it off to another module without being really clear about who is responsible for cleaning it up later. This means that you'll often write methods that expect buffers and other blocks of memory passed into them, so that they don't have to worry about memory ownership.
Which is how we get this block of code:
Set myArrays = UniqueArrayUtils.getUniqueArrays(new LinkedHashSet());
public static Set getUniqueArrays(Set pUniqueArraySet) {
...
for (sensor in getSomeSensors()) {
if (blah == blahblah) {
pUniqueArraySet.add(new UniqueArray(sensor.special_id, sensor...));
}
}
...
return pUniqueArraySet;
}
"Arrays" here don't refer to arrays in the programming sense, but instead to arrays in the "sensor array" sense. This method is called only once, like you see it here, and could easily have been private.
But what you can see here is some vestige of "who owns the memory". getUniqueArrays
could easily create its own Set
, return it, and be done. But no, it needs to accept a Set
as its input, to manipulate it.
In the scheme of things, this isn't terrible, but this pattern reasserts itself again and again. Methods which could easily construct and return objects instead expect empty objects passed into them.
As John writes:
I imagine an angry C programmer saying, "What do you MEAN there's no pointers?!"
https://thedailywtf.com/articles/a-pointer-to-your-references
Метки: CodeSOD |
A Basic Print Algorithm |
In the late 90s, Aaron was employed at a small software company. When his coworker Mark submitted a letter of resignation, Aaron was assigned to maintaining the vast system Mark had implemented for an anonymous worldwide company. The system was built in the latest version of Visual Basic at the time, and connected to an Oracle database. Aaron had never written a single line of VB, but what did that matter? No one else in the company knew a thing about it, either.
Before Mark parted ways with the company, he and Aaron traveled to their customer's headquarters so that Aaron could meet the people he'd be supporting and see the system perform in its live environment. A fellow named Boris met them and gave a demo at his cubicle. At one point, he pressed the Print button to print out a report of certain records. After some serious hourglassing, the system displayed a dialog box asking, Do you want to print page 1?, with Yes and No as options. Boris chose No.
More hourglassing. Do you want to print page 2?
And on it went. Not only did Boris have to choose Yes or No for every page, the time to display each prompt was ridiculous, anywhere from 30 to 90 seconds per page.
By the time they crawled to page 30, Aaron was dying inside and could no longer restrain himself. "Why is it like this?!" he half-asked, half-accused Mark.
"The customer wanted it this way," Mark replied, either unaware or indifferent to his coworker's frustration.
"This is the way we want it," Boris chimed in. "We don’t always want to print every page. Sometimes we just want one page, could be page 73."
"But why not give one prompt where the user can type in the specific pages they want?" Aaron asked.
"Is that possible?" Wide-eyed, Boris turned to Mark. "You told us it wasn't possible!"
"It isn't," Mark said with conviction.
Aaron flushed with embarrassment. He assumed he'd put his foot in his mouth in front of an important customer. "Sorry. You're the expert."
Still, the issue niggled at Aaron long after they returned from the customer's site. Once Mark had packed up his cube and departed for good, Aaron tracked down the report-printing code. He found a loop that started off something like this:
for k = 1 to SELECT MAX(PAGENO) from REPORTTABLE WHERE REPORTNUMBER = theReportNo
begin
SELECT * from REPORTTABLE WHERE REPORTNUMBER = theReportNo;
...
So, for every single page to be printed, the max page number was re-queried and every single record for the report was retrieved anew, never to be retained in memory.
Aaron didn't have to be a VB genius to realize how much this was killing performance. With a little experimentation, he figured out how to implement a dialog box more like the one he'd described in front of the customer. Boris and the other users were thrilled to receive the "impossible" time-saving fix, and Aaron learned an important lesson about how an Expert's word isn't necessarily gospel.
Метки: Feature Articles |
CodeSOD: The Correct Browser |
Sometimes, it's not the code that's bad, but what the code costs. For Elizabeth's company, that cost was significant in terms of dollars and cents. They needed to staff up to accomplish some major Java Enterprise work, so they went with the highest of the highly paid consultants they could find. These consultants came from a big name firm, and were billed at an eye-watering hourly rate.
Elizabeth warns us that the Java code is a behemoth of WTFs that is "too difficult to describe", but one particular WTF leapt out at her. Specifically, included in the application was a file called nonIEUser.html
. This project was happening circa 2012, which is after Microsoft finally admitted standards might need to be a thing, and definitely well outside of the time when your web application should only work in Internet Explorer. For a greenfield project, there was no reason to do anything IE only, and fortunately, they didn't- aside from forcing a check to yell at you if you didn't use IE.
This is the error page that it would display:
HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<head>
<title>Technical errortitle>
<meta http-equiv="expires" content="0">
<meta http-equiv="Cache-Control" CONTENT="no-store,no-cache,must-revalidate,post-check=0,pre-check=0">
<meta http-equiv="Pragma" CONTENT="no-cache">
<link rel="STYLESHEET" type="text/css" href="../css/initech.css">
head>
<body>
<html lang="en">
<p><strong><h2>This application can only be used with Internet Explorer!h2>strong>p>
<p> p>
<p><strong><h2>Other browsers are not supported.h2>strong>p>
html>
body>
The "fun" part of this is that the page isn't wrapped in an tag, and instead the tag is embedded inside the
. In the omitted sections is a pile of JavaScript that didn't work in any browser, IE included.
The real killer, though, is that the consultants billed 32 hours on "enforcing IE only compatibility". As it usually goes with consultant-driven projects, nobody in Elizabeth's management blinked twice at paying through the nose for a feature they didn't need, implemented badly.
Метки: CodeSOD |
The New Management |
For a young college graduate in the early 80s, Argle was fortunate to already have some real-world experience. That served him well, because businesses which were looking towards the future were already looking into how they could improve their automation with the new and relatively cheap computer systems that were hitting the market.
One such company was a family-owned, multi-generational manufacturing company. They had a vision for the future, and the future involved the latest in CNC milling machines and robotic manufacturing. They needed the team that could send them into the future, and were hiring to build that team.
Argle was one of those hires, brought on as a junior developer. His mentor, Stanley, was an old Texas Instruments guy who had helped design many of the chips that were driving the fancy robots. Stanley leaned into his mentor role, both in terms of being a good mentor, but also in terms of the aesthetic: he was a bearded pipe smoker in a tweed jacket with patches on the elbows, and a pocket protector loaded with pens and a compact slide rule.
For a small manufacturing firm, the owner was happy to invest in this team. He brought on vets from JPL or mechanical engineers who had helped automate German auto plants. The owner himself heavily recruited from the same college that Argle attended, giving talks about the future of automation and reinforcing his company's commitment to the future. Interns and junior developers bulked out the team.
The owner walked the walk, talked the talk, and was putting money where it needed to go. The job was an absolute delight for Argle and the rest of the team. He learned a lot from Stanley, and also from the work itself.
And then, one day, the owner introduced Gordon. "This, is our new President, Gordon. He'll be handling the overall operations of the company while I focus on our vision and direction."
Now, for most employees, the introduction of Gordon was barely worth commenting on. New management slotting into leadership positions was just background noise and didn't really impact the work. Except for Argle. Argle actually knew Gordon, at least by reputation, because Gordon was the VP at the local public broadcasting company.
Now you might wonder, "how does experience in broadcasting help someone oversee a manufacturing company?" Well, Argle had the inside scoop on exactly how Gordon would lead. Argle's father worked at the local PBS affiliate, and had regaled Argle with all sorts of stories about Gordon's style. That style was a blend of bullying and cronyism.
Now, up to this point, Argle's team had acted more or less like a leaderless collective. They all had a goal, they all understood the goal, and they all pushed together to achieve the goal. There was no manager. They'd defer to the senior members on matters of disagreement, but even then it was never "Stanley said so," and more "Stanley will explain this so everyone comes to an agreement."
That, of course, couldn't stand under Gordon's leadership. So Gordon hired Dave to provide management. Like Gordon, Dave also had no background in manufacturing, technology, automation or robotics. Or, in actuality, project management, as Dave illustrated in his first project meeting.
As this was the 80s, the conference room was a nicotine stained room with a transparency projector. Stanley sat in a corner, puffing away at his pipe. Dave had a stack of transparencies and a red dry erase marker to scribble on them with.
"So, Stanley," Dave said as he slapped one of the estimates Stanley had assembled onto the projector. "How long did you think this effort would take?"
Stanley pointed his pipe stem at the numbers Dave was projecting. "An effort like this will take a year."
"That's much too long," Dave said. "I was looking this over, and you had 6 weeks for milling these parts, but I think we can outsource that and get them back in three weeks. I have a vendor already committed." Dave edited the numbers with his red pen. "And then development time, you've got poor Argle booked for six months, after the hardware design is finalized, but we can start development earlier than that…"
People around the room started raising their objections. Dave had no time for these naysayers. "You would think that, but you haven't even finished with college," he told an intern. "Maybe things worked that way at JPL, but we live in the real world here." "If TI was such a good company, you'd probably still work there- either they suck or you're an idiot."
By the time Dave finished his tirade, he had individually insulted everyone on the team, and had cut the project time down to six months. "You see? We can totally do this project in six months."
Stanley took a few puffs of his pipe and said, "You can say it will take six months, but it will still take a year."
As Dave started piloting the team straight into the ground, Argle got an offer. A few of his college friends had moved out to another state, launched a startup, and were offering him a 40% wage increase plus moving expenses. Add into the fact that Dave had explained that nobody on the team would be eligible for a raise for five years, Argle was halfway out the door.
But only halfway. Argle was young, still had some optimism, and wanted to be loyal to his team, even if he wasn't loyal to the company. So he talked it over with Stanley.
"I like this team, and I like the work that we're doing, and I'd hate to leave the team in a lurch."
Stanley puffed on his pipe, and then answered. "The company will be sad to see you go. I'll be sad to see you go. But the company could lay you off tomorrow, and they'd be just as sad about it too. But they'd do it if they thought it was necessary. You don't owe this company anything more than that."
So Argle submitted his notice. By coincidence, it was on April First, which Dave tried to use as an excuse to bully Argle into feeling guilty about either a bad prank or bad timing for quitting. Dave wanted to make a counter offer, but he couldn't do it without insulting Argle on the way to offering him a raise, which made Argle's choice very easy.
Two weeks later, he was loading a truck with all his worldly possessions, and two weeks after that he was settled into a new house, and a new job, and even happier than he'd been at the manufacturing company.
Over a year later, Argle went back to visit family, and swung by the old company to see how the team was doing. Stanley was still there, but Dave and Gordon were long gone. The owner was fooled for a bit, but was too smart to stay fooled. Dave and Gordon were out the door only a few months after Argle left.
"So," he asked Stanley, "how'd that project go? How long did it take?"
Stanley puffed on his pipe and chuckled. "Oh, about a year."
Метки: Feature Articles |
Error'd: Everything Old is New Again |
Whenever there's a major change in the world, it always takes application developers a little time to adjust. Remember when the US government thought it would be a great idea to mess around with their Daylight Saving Time schedule with only two years warning? (I'm guessing nobody remembers the fiddling done by earlier administrations because they were too young to care, or not born yet.) Two years warning probably seemed like plenty to non-technical legislators, not thinking about all the software that was in place with built-in calendars. Well, someone has apparently decided to one-up a measly time change, by inventing something called a New YEAR. This resets the entire calendar, and it must be a novel practice because surely websites wouldn't break due to some routine event that has been happening for at least a dozen years or more, right? Right?
Aspiring Pok'e trainer Valts S. began a long long time ago far far away.
Resubmitter David B. is strong with the zero balances. "My friend was notified that his brand new insurance policy for 2022 is already past due. The interest is going to be killer."
Unlike David's friend, contributor JP is getting a head start. "I'll sign up with Netflix in a year, but I'm paying now."
While an anonymous contributor is way, way, way ahead of the game. Anonymous EA predicts Apex Legends will be very popular in 10 years.
Finally, if you're sick of all the weak puns, Ron K. has a favored medical provider. Good luck scheduling an appointment.
https://thedailywtf.com/articles/everything-old-is-new-again
Метки: Error'd |
CodeSOD: Well Trained |
Mandatory compliance training is a thing. The reasons behind it range from companies trying to reduce civil liabilities to actual legal mandates which require the training. The quality of mandatory training ranges from "useless" to "actively awful", and it's mostly PowerPoint-style slides interspersed with quizzes to make sure you were "paying attention". The worse ones will usually have timers on the slides so you can't just click past, and have to actually idle to "force" you to read it.
Also, since legal compliance tends to move slower than technology, training built years ago is frequently still relevant. So, for example, Duncan's company built training back when you could reasonably expect Flash to run in the browser. Building the training and the content cost money, so once Flash got deprecated, they weren't just going to throw that money away- they found a contractor who'd convert it to "HTML5".
Now, this means that the code quality is garbage, which is fine. We can't really fault the tool. But there are some assumptions about the very use of the tool that render these quizzes even more useless than the usual fare:
function checkQuestions( bFeedback, bForce ) {
if( !bForce )
if( bFeedback && !forceCheckQuestions() ) return 0;
var ans_VarQuestion_05 = VarQuestion_05.getValue()
if( bFeedback && currFeedbackIdx == 0 && !qu84909.hasBeenProcessed) {
if( ans_VarQuestion_05 == 'A. ' ) {
settings = 'height=300,width=400,top='+(screen.height-300)/2+',left='+(screen.width-400)/2
if( is.ns ) settings += ",modal=yes,dialog=yes"
trivWndFeedback = new jsDlgBox( '84909', '20013', 'page81719.html', function(){ trivWndFeedback=null; setTimeout( 'checkLeavePage()', 100); }, 400, 300 );
trivWndFeedback.create();
return 0;
}
else if( ans_VarQuestion_05 == 'B. ' ) {
settings = 'height=300,width=400,top='+(screen.height-300)/2+',left='+(screen.width-400)/2
if( is.ns ) settings += ",modal=yes,dialog=yes"
trivWndFeedback = new jsDlgBox( '84909', '20013', 'page81714.html', function(){ trivWndFeedback=null; setTimeout( 'checkLeavePage()', 100); }, 400, 300 );
trivWndFeedback.create();
return 0;
}
else if( ans_VarQuestion_05 == 'C. ' ) {
settings = 'height=300,width=400,top='+(screen.height-300)/2+',left='+(screen.width-400)/2
if( is.ns ) settings += ",modal=yes,dialog=yes"
trivWndFeedback = new jsDlgBox( '84909', '20013', 'page81719.html', function(){ trivWndFeedback=null; setTimeout( 'checkLeavePage()', 100); }, 400, 300 );
trivWndFeedback.create();
return 0;
}
else if( ans_VarQuestion_05 == 'D. ' ) {
settings = 'height=300,width=400,top='+(screen.height-300)/2+',left='+(screen.width-400)/2
if( is.ns ) settings += ",modal=yes,dialog=yes"
trivWndFeedback = new jsDlgBox( '84909', '20013', 'page81719.html', function(){ trivWndFeedback=null; setTimeout( 'checkLeavePage()', 100); }, 400, 300 );
trivWndFeedback.create();
return 0;
}
}
if( !bFeedback ) currFeedbackIdx = 1;
return 1
}
Now, the page quite "securely" disabled right click, so it was "impossible" to open debugging tools or view source, short of knowing how to navigate menus or use keyboard shortcuts.
If one reads the code carefully, we know that B.
is the correct answer- the other three answers all go to the same page, but B.
is the odd one out.
Now, is this actually easier than just using common sense, because these trainings aren't designed to actually test people and instead just provide a veneer of plausible "we made them take a quiz" logic?
Probably not. But at least Duncan was more entertained than he would be by actually doing the training.
Метки: CodeSOD |
CodeSOD: Do Nothing |
Ivan encountered a strange bug. His organization uses the R language, which has a handy-dandy documentation language attached to it, for Rd files. The language itself is an organically grown hodge-podge of R and LaTeX, built to make it easy to format both plain text and R code within the documentation. It lets you use LaTeX-like commands, but also mix in R code to control the output.
Ivan's problem was that one of his macros, which we'll call \mymacro
, only worked sometimes. The specific cases where it failed were where the macro expanded into multi-line output, which once upon a time wasn't a thing that Rd supported, but is supported, and clearly wasn't the problem. Ivan poked at it from that direction, suspecting there was maybe a regression, and then spent a lot of time trying to understand the places where the macro did and didn't work.
It took some time, but eventually the key difference was that the pages that worked also called another macro, \doi{}
, which itself called \Sexpr[stage=build]{...}
. Now, there's one important thing to note about the \Sexpr
macro: it's meant to invoke R code inside of your documentation. And that made all the difference.
The documentation which didn't contain R code would be stored as a raw documentation file in the package. Before rendering the documentation, the parseRd
tool would need to parse the file and generate the documentation output. This would happen after the package was built and distributed. Since the \mymacro
might expand into nothing, this was technically unparseable at that point, and would cause the documentation render to fail.
On the other hand, documentation which did contain R code would be parsed and the parse tree would be stored in the package. There would be no parse step when the documentation got rendered. The whole "expanding to nothing" problem didn't exist in this situation.
So the fix was obvious, at least to Ivan:
--- man/macros/macros.Rd
+++ man/macros/macros.Rd
@@ -1,2 +1,3 @@
+\newcommand{\mustbuild}{\Sexpr[results=hide,stage=build]{}}
-\newcommand{\mymacro}{\ifelse{html}{\out{...}}{...}}
+\newcommand{\mymacro}{\mustbuild\ifelse{html}{\out{...}}{...}}
He added a \mustbuild
macro which hide
s the results of a null operation, then added a call to that macro inside \mymacro
. Now the documentation generates properly, even in older version of R which don't support some of the macro techniques being used (since the parse tree itself is cached, after macro expansion is complete).
Метки: CodeSOD |
CodeSOD: Cloudy Optimizations |
Search engine optimization is both a dark art and a corrupt industry. Search providers work hard to keep their algorithms opaque. SEO is a mix of guessing and snake oil and sometimes outright lying.
For example, Mark M recently inherited a rather… bad PHP website. One of its notable SEO tweaks was that it had a tag cloud that slapped a bunch of keywords together to give a sense of what kinds of articles had been posted recently. At least, that was the idea. But when Mark dug into the code, there was no sign that there was any source of tags in the database. In fact, articles didn't get tagged at all. So where was the tag cloud coming from?
"tag_cloud">
Popular tags
class="tag_">span>
p>
div>
Yes, they just hard coded a bunch of tags that they presumed would drive clicks, then dump them into the document while applying a randomly selected CSS class to style them all differently.
It's… a choice. A series of choices, really. A series of bad choices, and I don't like any of it.
Метки: CodeSOD |
My Many Girlfriends |
In the long ago, wild-west days of the late 90s, there was an expectation that managers would put up with a certain degree of eccentricity from their software developers. The IT and software boom was still new, people didn't quite know what worked and what didn't, the "nerds had conquered the Earth" and managers just had to roll with this reality. So when Barry D gave the okay to hire Sten, who came with glowing recommendations from his previous employers, Barry and his team were ready to deal with eccentricities.
Of course, on the first day, building services came to Barry with some concerns about Sten's requests for his workspace. No natural light. No ventilation ducts that couldn't be closed. And then the co-workers who had interacted with Sten expressed their concerns.
During the hiring process, Sten had come off as a bit odd, but this seemed unusual. So Barry descended the stairs into the basement, to find Sten's office, hidden between a janitorial closet and the breaker box for the building. Barry knocked on the door.
"Sten awaits you. Enter."
Barry entered, and found Sten precariously perched on an office chair, removing several of the fluorescent bulbs from the ceiling fixture. The already dark space was downright cave-like with Sten's twilight lighting arrangement. "He welcomes you," Sten said.
"Uh, yeah, hi. I'm Barry, I'm working on the Netware 3.x portion of the product, and Carl just wanted be to check in. Everything okay?
"This is acceptable to Sten," Sten said, gesturing at the dim office as he descended from the chair. Sten's watched beeped on the hour, and Sten carefully placed the fluorescent bulb off to the side, in a stack of similarly removed bulbs, and then went to his desk. In rapid succession, he popped open a few pill containers- 5000mg of vitamin C, a handful of herbal and homeopathic pills- and gulped them down. He then washed the pills down with a tea that smelled like a mixture of kombucha and a dead raccoon buried in a dumpster.
"He is pleased to meet you," Sten said, with a friendly nod. Barry blinked, trying to track the conversation. "And he is pleased with it, and has made great progress on building it. You will like his things, yes?"
"Uh… yes?"
"He is pleased, and I hope you can go to him and tell him that he is pleased with this, and set his mind at ease about Sten."
So it went with Sten. He strictly referred to himself in the third person. He frequently spoke in sentences with nothing but pronouns, and frequently reused the same pronoun to refer to different people. The vagueness was confounding, but Sten's skill was in Netware 2.x- a rare and difficult set of skills to find. So long as the code was clear, everything would be fine.
Everything was not fine. While Sten's code didn't have the empty vagueness of unclear pronouns, it also didn't have the clarity of meaningful variable names. Every variable and every method name was given a female first name. "Each of these is named for one of Sten's girlfriends." Given the number of names required, it was improbable that these were real girlfriends, but Sten gave no hint about this being fiction.
There was some consistency about the names. Instead of i
, j
, and k
loop variables, you had Ingrid
, Jane
, and Katy
. Zaria
seemed to be only used as a parameter to methods. Karla
seemed to be a temporary variable to hold intermediate results. None of these conventions were documented, obviously, and getting Sten to explain them was an exercise in confusion.
It led to some entertaining code reviews. "Michelle here talks to Nancy about Francine, and then Ingrid goes through Francine's purse to find Stacy." This described a method (Michelle) which called another method (Nancy), passing an array (Francine). Nancy iterates across the array (using Ingrid), to find a specific entry in the array (Stacy).
Sten lasted a few weeks at the job. It wasn't a very successful period of time for anyone. Peculiarities aside, the final straw wasn't the odd personal habits or the strange coding conventions- Sten just couldn't produce working code quickly enough to keep up with the rest of the team. Sten had to be let go.
A few weeks later, Barry got a call from a hiring manager at Initrode. Sten had applied, and they were checking the reference. "Yes, Sten worked here," Barry confirmed. After a moment's thought, he added, "I suggest that you bring him in for a second interview, and have him walk you through some code that he's written."
A few weeks after that, Barry got a gift basket from the manager at Initrode.
Thanks for the tip
Sten did not get hired at Initrode.
Метки: Feature Articles |
Error'd: Fin |
At the end of the year, it's customary to reflect on the past and imagine a future. Here at Error'd, reflecting on the past is natural, but all we can do about the future is hope. So to close out the longest 2020, here are a handful of little muffed missives.
Occasional contributor Peter diagnoses a counting error. "Looks like the web server had a thing or two to add to the discussion."
Long-time reader Willy M. has been sending us goodies for over a decade. This time he's a bit heated. "Hard to plan finances when your 0-month energy plan ends on invalid date."
Willy is not to be outdone by multiple contributor Ryan S. who shares a tasty morsel. "I wasn't brave enough to chance the mystery dish."
And in with a BOGO, it's Ryan S. again, who has found something emptier than a vacuum. "I guess Kohls started using qubits for inventory quantity."
Finally, to ring out the year (honestly, it really was 2021), handy shopper Stewart cleans up. "Screwfix maths goes screwy: lb3.89 less the lb0.64 tax makes lb14.25??? Can’t even think what they did to make this so wrong."
Happy New Year, and we'll see you again on the other side of that arbitrary Gregorian divide.
Метки: Error'd |
Best of…: Best Of 2021: Totally Up To Date |
2021 has been a year that flew by so quickly it's hard to keep up. But keeping up with changes can frequently be harder than it seems.
The year was 2015. Erik was working for LibCo, a company that offered management software for public libraries. The software managed inventory, customer tracking, fine calculations, and everything else the library needed to keep track of their books. This included, of course, a huge database with all book titles known to the entire library system.
Having been around since the early 90s, the company had originally not implemented Internet connectivity. Instead, updates would be mailed out as physical media (originally floppies, then CDs). The librarian would plug the media into the only computer the library had, and it would update the catalog. Because the libraries could choose how often to update, these disks didn't just contain a differential; they contained the entire catalog over again, which would replace the whole database's contents on update. That way, the database would always be updated to this month's data, even if it hadn't changed in a year.
Time marched on. The book market grew exponentially, especially with the advent of self-publishing, and the Internet really caught on. Now the libraries would have dozens of computers, and all of them would be connected to the Internet. There was the possibility for weekly, maybe even daily updates, all through the magic of the World Wide Web.
For a while, everything Just Worked. Erik was with the company for a good two years without any problems. But when things went off the rails, they went fast. The download and update times grew longer and longer, creeping ever closer to that magic 24-hour mark where the device would never finish updating because a new update would be out before the last one was complete. So Erik was assigned to find some way, any way, to speed up the process.
And he quickly found such a way.
Remember that whole drop the database and replace the data thing? That was still happening. Over the years, faster hardware had been concealing the issue. But the exponential catalogue growth had finally outstripped Moore's Law, meaning even the newest library computers couldn't keep up with downloading the whole thing every day. Not on library Internet plans.
Erik took it upon himself to fix this issue once and for all. It only took two days for him to come up with a software update, which was in libraries across the country after 24 hours. The total update time afterward? Only a few minutes. All he had to do was rewrite the importer/updater to accept lists of changed database entries, which numbered in the dozens, as opposed to full data sets, which numbered in the millions. No longer were libraries skipping updates, after all.
Erik's reward for his hard work? A coupon for a free personal pizza, which he suspected his manager clipped from the newspaper. But at least it was something.
https://thedailywtf.com/articles/best-of-2021-totally-up-to-date
Метки: Best of |
Best of…: Best of 2021: The Therac-25 Incident |
It's not always "fun" bugs and flaws. Earlier this year, we did a deep dive on a much more serious example of what can go wrong.
A few months ago, someone noted in the comments that they hadn't heard about the Therac-25 incident. I was surprised, and went off to do an informal survey of developers I know, only to discover that only about half of them knew what it was without searching for it.
I think it's important that everyone in our industry know about this incident, and upon digging into the details I was stunned by how much of a WTF there was.
Today's article is not fun, or funny. It describes incidents of death and maiming caused by faulty software engineering processes. If that's not what you want today, grab a random article from our archive, instead.
When you're strapping a patient to an electron gun capable of delivering a 25MeV particle beam, following procedure is vitally important. The technician operating the Therac-25 radiotherapy machine at the East Texas Cancer Center (ETCC) had been running this machine, and those like it, long enough that she had the routine down.
On March 21, 1986, the technician brought a patient into the treatment room. She checked their prescription, and positioned them onto the bed of the Therac-25. Above the patient was the end-point of the emitter, a turntable which allowed her to select what kind of beam the device would emit. First, she set the turntable to a simple optical laser mode, and used that to position the patient so that the beam struck a small section of his upper back, just to one side of his spine.
By Ajzh2074 - Own work, CC BY-SA 4.0, Link
With the patient in the correct position, she rotated the turntable again. There were two other positions. One would position an array of magnets between the beam and the patient; these would shape and aim the beam. The other placed a block of metal between the beam and the patient. When struck by a 25MeV beam of electrons, the metal would radiate X-rays.
This patient's prescription was for an electron beam, so she positioned the turntable and left the room. In the room next door, shielded from the radiation, was the control terminal. The technician started keying in the prescription to begin the treatment.
If things were exactly following the routine, she'd be able to communicate with the patient via an intercom, and monitor the patient via a video camera. Sadly, that system had broken down today. Still, this patient had already had a number of treatments, so they knew what to expect, so that communication was hardly necessary. In fact, the Therac-25 and all the supporting equipment were always finicky, so "something doesn't work" practically was part of the routine.
The technician had run this process so many times she started keying in the prescription. She'd become an extremely fast typist, at least on this device, and perhaps too fast. In the field for beam type, she accidentally keyed in "X", for "x-ray". It was a natural mistake, as most patients got x-ray treatments, and it wasn't much of a problem: the computer would see that the turntable was in the wrong position and refuse to dose the patient. She quickly tapped the "UP" arrow on the keyboard to return to the field, corrected the value to "E", for electron, and confirmed the other parameters.
Her finger hovered over the "B" key on the keyboard while she confirmed her data entry. Once she was sure everything was correct, she pressed "B" for "beam start". There was no noise, there never was, but after a moment, the terminal read: "Malfunction 54", and then "Treatment Pause".
Error codes were no surprise. The technicians kept a chart next to the console, which documented all the error codes. In this case, "Malfunction 54" meant a "dose input 2" error.
That may not have explained anything, but the technician was used to the error codes being cryptic. And this was a "treatment pause", which meant the next step was to resume treatment. According to the terminal, no radiation had been delivered yet, so she hit the "P" key to unpause the beam.
That's when she heard the screaming.
The patient had been through a number of these sessions already, and knew they shouldn't feel a thing. The first time the technician activated the beam, however, he felt a burning sensation, which he later described like "hot coffee" being poured on his back. Without any intercom to call for help, he started to get off the treatment table. He was still extricating himself, screaming for help, when the technician unpaused the beam, at which point he felt something like a massive electric shock.
That, at first, was the diagnosis. A malfunction in the machine must have delivered an electric shock. The patient was sent home, and the hospital physicist examined the Therac-25, confirming everything was in working order and there were no signs of trouble. It didn't seem like it would happen again.
The patient had been prescribed a dose of 180 rads as part of a six-week treatment program that would deliver 6,000 rads in total. According to the Therac-25, the patient had received an underdose, a fraction of that radiation. No one knew it yet, but the malfunction had actually delivered between 16,000 and 25,000 rads. The patient seemed fine, but in fact, they were already dead and no one knew it yet.
The ETCC incident was not the first, and sadly was not the last malfunction of the Therac-25 system. Between June 1985 and July 1987, there were six accidents involving the Therac-25, manufactured by Atomic Energy Canada Limited (AECL). Each was a severe radiation overdose, which resulted in serious injuries, maimings, and deaths.
As the first incidents started to appear, no one was entirely certain what was happening. Radiation poisoning is hard to diagnose, especially if you don't expect it. As with the ETCC incident, the machine reported an underdose despite overdosing the patient. Hospital physicists even contacted AECL when they suspected an overdose, only to be told such a thing was impossible.
A few weeks later, there was a second overdose at ETCC, and it was around that time that the FDA and the press started to get involved. Early on, there was a great deal of speculation about the cause. Of interest is this comment from the RISKS mailing list from 1986.
Here is my speculation of what happened: I suspect that the current in the electron beam is probably much greater in X-ray mode (because you want similar dose rates in both modes, and the production of X-rays is more indirect). So when you select X-rays, I'll bet the target drops into place and the beam current is boosted. I suspect in this case, the currentwas boosted before the target could move into position, and a very high current electron beam went into the patient.
How could this be allowed to happen? My guess is that the software people would not have considered it necessary to guard against this failure mode. Machine designers have traditionally used electromechanical interlocks to ensure safety. Computer control of therapy machines is a fairly recent development and is layered on top of, rather than substituting for, the old electromechanical mechanisms.
The Therac-25 was the first entirely software-controlled radiotherapy device. As that quote from Jacky above points out: most such systems use hardware interlocks to prevent the beam from firing when the targets are not properly configured. The Therac-25 did not.
The software included a number of key modules that ran on a PDP-11. First, there were separate processes for handling each key function of the system: user input, beam alignment, dosage tracking, etc. Each of these processes was implemented in PDP-11 Assembly. Governing these processes was a real-time OS, also implemented in Assembly. All of this software, from the individual processes to the OS itself, were the work of a single software developer.
AECL had every confidence in this software, though, because it wasn't new. The earliest versions of the software appeared on the Therac-6. Development started in 1972, and the software was adapted to the Therac-25 in 1976. The same core was also used on the Therac-20. Within AECL, the attitude was that the software must be safe because they'd been using it for so long.
In fact, when AECL performed their own internal safety analysis of the Therac-25 in 1983, they did so with the following assumptions:
1) Programming errors have been reduced by extensive testing on a hardware simulator, and under field conditions on teletherapy units. Any residual software errors are not included in the analysis. 2) Program software does not decay due to wear, fatigue, or reproduction errors. 3) Computer software errors are caused by faulty hardware components, and "soft" (random) errors induced by alpha particles or electromagnetic noise.
In other words: we've used the software for a long time and software always copies and deploys perfectly. So, any bugs we see would have to be transient bugs caused by radiation or hardware errors.
After the second incident at ETCC, the hospital physicist took the Therac-25 out of service and worked with the technician to replicate the steps that caused the overdose. It wasn't easy to trigger the "Malfunction 54" error message, especially when they were trying to methodically replicate the exact steps, because as it turned out, if you entered the data slowly, there were no problems.
To trigger the overdose, you needed to type quickly, the kind of speed that an experienced operator might have. The physicist practiced until he could replicate the error, then informed AECL. While he was taking measurements to see how large the overdoses were, AECL called back. They couldn't replicate the issue. "It works on my machine," essentially.
After being coached on the required speed, the AECL technicians went back to it, and confirmed that they could trigger an overdose. When the hospital physicist took measurements, they found roughly 4,000 rads in the overdose. AECL, doing similar tests, triggered overdoses of 25,000 rads. The reality is that, depending on the timing, the output was potentially random.
With that information, the root cause was easier to understand: there was a race condition. Specifically, when the technician mistyped "X" for x-ray, the computer would calculate out the beam activation sequence to deliver a high-energy beam to create x-rays. When the technician hit the "UP" arrow to correct their mistake, it should've forced a recalculation of that activation sequence—but if the user typed too quickly, the UI would update and the recalculation would never happen.
By the middle of 1986, the Food and Drug Administration (FDA) was involved, and demanded that AECL provide a Corrective Action Plan (CAP). What followed was a lengthy process of revisions as AECL would provide their CAP and the FDA would follow up with questions, resulting in new revisions to the CAP.
For example, the FDA reviewed the first CAP revision and noted that it was incomplete. Specifically, it did not include a test plan. AECL responded:
no single test plan and report exists for the software since both hardware and software were tested and exercised separately together for many years.
The FDA was not pleased with that, and after more back and forth, replied:
We also expressed our concern that you did not intend to perform the [test] protocol to future modifications to the software. We believe that rigorous testing must be performed each time a modification is made to ensure the modification does not adversely affect the safety of the system.
While AECL struggled to include complex tasks like testing in their CAP, they had released instructions that allowed for a temporary fix to prevent future incidents. Unfortunately, in January, 1987, there was another incident, caused by a different software bug.
In this bug, there was a variable shared by multiple processes, meant as a flag to decide whether or not the beam collimator in the turntable needs to be checked to ensure everything is in the correct position. If the value is non-zero, the check needs to be performed. If the value is zero, it does not. Unfortunately, the software would increment the field, and the field was only one byte wide. This meant every 256th increment, the variable would be zero when it should have been non-zero. If that incorrect zero lined up with an operator action, the beam would fire at full energy without the turntable in the right position.
AECL had a fix for that (stop incrementing and just set the value), and amended their CAP to include that fix. The FDA recognized that was probably going to fix the problem, but still had concerns. In an internal memo:
We are in the position of saying that the proposed CAP can reasonably be expected to correct the deficiencies for which they were developed (Tyler). We cannot say that we are [reasonably] confident about the safety of the entire system…
This back-and-forth continued through a number of CAP revisions. At each step in the process, the FDA found issues with testing. AECL's test process up to this point was simply to run the machine and note if anything went wrong. Since the software had been in use, in some version, for over a decade, they did not see any reason to test the software, and thus had no capacity or plan for actually testing the software when the FDA required it.
The FDA, reviewing some test results, noted:
Amazingly, the test data presented to show that the software changes to handle the edit problems in the Therac-25 are appropriate prove the opposite result. … I can only assume the fix is not right, or the data were entered incorrectly.
Eventually, the software was fixed. Legislative and regulatory changes were made to ensure incidents like this couldn't happen in the future, at least not the same way.
It's worth noting that there was one developer who wrote all of this code. They left AECL in 1986, and thankfully for them, no one has ever revealed their identity. And while it may be tempting to lay the blame at their feet—they made every technical choice, they coded every bug—it would be wildly unfair to do that.
With AECL's continued failure to explain how to test their device, it should be clear that the problem was a systemic one. It doesn't matter how good your software developer is; software quality doesn't appear because you have good developers. It's the end result of a process, and that process informs both your software development practices, but also your testing. Your management. Even your sales and servicing.
While the incidents at the ETCC finally drove changes, they weren't the first incidents. Hospital physicists had already reported problems to AECL. At least one patient had already initiated a lawsuit. But that information didn't propagate through the organization; no one put those pieces together to recognize that the device was faulty.
On this site, we joke a lot at the expense of the Paula Beans and Roys of this world. But no matter how incompetent, no matter how reckless, no matter how ignorant the antagonist of a TDWTF article may be, they're part of a system, and that system put them in that position.
Failures in IT are rarely individual failures. They are process failures. They are systemic failures. They are organizational failures. The story of AECL and the Therac-25 illustrates how badly organizational failures can end up.
AECL did not have a software process. They didn't view software as anything more than a component of a larger whole. In that kind of environment, working on safety critical systems, no developer could have been entirely successful. Given that this was a situation where lives were literally on the line, building a system that produced safe, quality software seems like it should have been a priority. It wasn't.
While the Therac-25 incident is ancient history, software has become even more important. While we would hope safety-critical software has rigorous processes, we know that isn't always true. The 737MAX is an infamous, recent example. But with the importance of software in the modern world, even more trivial software problems can get multiplied at scale. Whether it's machine learning reinforcing racism, social networks turning into cesspools of disinformation or poorly secured IoT devices turning into botnets, our software exists and interacts with the world, and has real world consequences.
If nothing else, I hope this article makes you think about the process you use to create software. Is the process built to produce quality? What obstacles to quality are there? Is quality a priority, and if not, why not? Does your process consider quality at scale? You may know your software's failure modes, but do you understand your organization's failure modes? Its blind spots? The assumptions it makes which may not be valid in all cases?
Let's return for a moment to the race condition that caused the ETCC incidents. This was caused by users hitting the up arrow too quickly, preventing the system from properly registering their edits. While the FDA CAP process was grinding along, AECL wanted to ensure that people could still use the Therac-25 safely, and that meant publishing quick fixes that users could apply to their devices.
This is the letter AECL sent out to address that bug:
SUBJECT: CHANGE IN OPERATING PROCEDURES FOR THE THERAC-25 LINEAR ACCELERATOR
Effective immediately, and until further notice, the key used for moving the cursor back through the prescription sequence (i.e., cursor "UP" inscribed with an upward pointing arrow) must not be used for editing or any other purpose.
To avoid accidental use of this key, the key cap must be removed and the switch contacts fixed in the open position with electrical tape or other insulating material.
For assistance with the latter you should contact your local AECL service representative.
Disabling this key means that if any prescription data entered is incorrect, than "R" reset command must be used and the whole prescription reentered.
For those users of the Multiport option, it also means that editing of dose rate, dose, and time will not be possible between ports.
On one hand, this is a simple instruction that would effectively prevent the ETCC incidents from reoccurring. On the other, it's terrifying to imagine a patient's life hanging on a ripped up keycap and electrical tape.
This article is intended as a brief summary of the incident. Most of the technical details in this article come from this detailed account of the Therac-25 incident. That is the definitive source on the subject, and I recommend reading the whole thing. It contains much more detail, including deeper dives into the software choices and organizational failures.
https://thedailywtf.com/articles/best-of-2021-the-therac-25-incident
Метки: Best of |
Best of…: Best Of 2021: Worlds Collide |
As we take inventory of the past year, let's look back on one way people track history. --Remy
George had gotten a new job as a contractor at a medium-sized book distributor. He arrived nice and early on Day 1, enthusiastic about a fresh start in a new industry.
His "office" turned out to be a huge warehouse stacked high with books. Upon greeting him, his manager pointed him to a PC in the corner of the warehouse, sitting on a desk with no partitions around it. The manager leaned over the machine and made a few double-clicks with the mouse until he opened up the H: drive. "There you go," he muttered, then left.
George stared after him, perplexed, wondering if the manager intended to bring over coffee or other coworkers to meet him. The way he was walking, though, seemed to convey that he had more important things to be doing than coddling greenhorns.
"You must be George. Hi, I'm Wally." Another gentleman came over with his hand poised to shake. "I handle the software we use to track inventory. Let me show you the ropes."
Wally used the nearby computer to demonstrate a handful of the 200-odd Delphi forms that constituted the inventory application. The source code was not in any kind of source control; it was all in a folder named Wally on the shared H: drive. They were using a version of Delphi from 1995 ... in 2010. Their database was some vague, off-brand SQL-esque construct that George later learned had been dropped from support as of 2003.
None of this inspired George's confidence, but he had a job to learn. Stifling a sigh, he asked Wally, "Could I have a copy of your database creation script? Then I could start with a fresh and empty database to learn on."
"No problem. Come with me."
Wally led George to another part of the warehouse where a different computer was set up; presumably, this was Wally's desk. Wally sat down at the machine and began typing away while tapping his foot and whistling a little tune.
This went on, and on ... and on. It certainly didn't seem like the quick typing one would do to create an email with an attachment. George shifted his weight uneasily from one foot to the other. As the rhythmic typing and whistling continued, it hit him: Wally was typing out the entire CREATE DATABASE code—from memory.
It took Wally a good 25 minutes to bang out everything needed to define 60-odd database fields including Title, ISBN, ISBN-19, Author, Publisher, etc. Finally, the one-man concert ceased; Wally sent the email. With a perfectly normal look on his face, he faced George and said, "There it is!"
In the moment, George was too flabbergasted to question what he'd witnessed. Later, he confirmed that Wally had never even thought to have a saved CREATE DATABASE SQL script on hand. Sadly, this was far from the last point of contention he experienced with his coworker. Wally could not comprehend why George might want some general utility functions, or a clean interface between modules, or anything more advanced than what one found in chintzy programming manuals. George's attempts at process improvement and sanity introduction got his building access card invalidated one morning about a month after starting. No one had expressed any sort of warning or reproach to him beforehand. George was simply out, and had to move on.
Move on he did ... but every once in a while, George revisits their old website to see if they're still in business. At the moment, said website has an invalid certificate. For a company whose whole business came down to head-scratching practices heaped upon 15 year-old unsupported tools, it's not so surprising.
https://thedailywtf.com/articles/best-of-2021-worlds-collide
Метки: Best of |
Best of…: Best of 2021: It's a Gift |
Per tradition, we're taking the week before the new year as a chance to review some of our favorites from this year. We open with this one from way back in January. Consider it… a gift.
Tyra was standing around the coffee maker with her co-workers when their phones all dinged with an email from management.
Edgar is no longer employed at Initech. If you see him on the property, for any reason, please alert security.
"Well, that's about time," Tyra said.
They had all been expecting an email like that. Edgar had been having serious conflicts with management. The team had been expanding recently, and along with the expansion, new processes and new tooling were coming online. Edgar hated that. He hated having new co-workers who didn't know the codebase as intimately as he did. "My technical knowledge is a gift!" He hated that they were moving to a CI pipeline that had more signoffs and removed his control over development. "My ability to react quickly to needed changes is a gift!" He hated that management- and fellow developers- were requesting more code coverage in their tests. "I write good code the first time, because I've got a gift for programming!"
These conflicts escalated, never quite to screaming, but voices were definitely raised. "You're all getting in the way," was a common refrain from Edgar, whether it was to his new peers or to management or to the janitor who was taking too long to clean the restroom. It seemed like everyone knew Edgar was going to get fired but Edgar.
Six months later, the team was humming along nicely. Pretty much no one thought about Edgar, except maybe to regale newbies with a tale of the co-worker from hell. One day, Tyra finished a requirement, ensured all the tests were green in their CI server, and then submitted a pull request. One of her peers reviewed the request, merged it, and pushed it along to their CD pipeline.
Fortunately for them, part of the CD step was to run the tests again; one of the tests failed. The failing test was not anything related to any of the changes in Tyra's PR. In fact, the previous commit passed the unit test fine, and the two versions were exactly the same in source control.
Tyra and her peers dug in, trying to see what might have changed in the CD environment, or what was maybe wrong about the test. Before long, they were digging through the CD pipeline scripts themselves. They hadn't been modified in months, but was there maybe a bad assumption somewhere? Something time based?
No. As it turned out, after many, many hours of debugging, there was an "extra" few lines in one of the shell scripts. It would randomly select one of the Python files in the commit, and a small percentage of the time, it would choose a random line in the file, and on that line replace the spaces with tabs. Since whitespace is syntactically significant in Python that created output which failed with an IndentationError
.
A quick blame
confirmed that Edgar had left them that little gift. As for how it had gone unnoticed for so long? Well, for starters, he had left during that awkward transition period when they were migrating new tools. The standard code-review/peer-review processes weren't fully settled, so he was able to sneak in the commit. The probability that it would tamper with a file was very low, and it wouldn't repeat itself on the next build.
It was hard to say how many times this had caused a problem. If a developer saw the unit test fail after accepting the PR, they may have just triggered a fresh build manually. But, more menacing, they didn't have 100% unit test coverage, and there were some older Python files (mostly written by Edgar) which had no unit tests at all. How many times might they have pushed broken files to production, only to have mysterious failures?
In the end, Edgar's last "gift" to the team was the need to audit their entire CI/CD pipeline to see if he left any more little "surprises".
Метки: Best of |
Error'd: Some Like It Hotter |
Fast approaching the end of the Gregorian calendar, things start to happen all at once, just to get them over with. According to Daniel D. "It's November 21 and Facebook can't decide which tomorrow comes first."
Not only simultaneous, but topsy-turvy, as Florian pontificates "I have no idea how such a bug is possible. All I know is that our Christmas get-together is in a couple of hours and I'll need to drink quite a bit to catch up to whoever's in charge of this XCode screen!"
Gary A. got into some deep water. "I was on a cruise recently, and their app provided a rather odd phone number for medical emergencies. I hope I can remember it."
Before leaving on his own vacation, Reinier B. wants to turn things down a bit. This seems really very awfully vochtig to me. Reinier explains "My 'smart thermostat' really isn't."
And again, with an unprecedented twofer, Reiner blasts "This is the same thermostat as before, reporting several orders of magnitude hotter than the hottest things in the universe." It's even hotter than Australia!
[Advertisement] ProGet’s got you covered with security and access controls on your NuGet feeds. Learn more.
Метки: Error'd |