CodeSOD: Shorely a Bad Choice |
"This was developed by the offshore team," is usually spoken as a warning. There are a lot of reasons why the code-quality from offshore teams has such a bad reputation. You can list off a bunch of reasons why this is true, but it all boils down to variations on the Princpal-Agent Problem: the people writing the code (the agents) don't have their goals aligned with your company (the principal).
Magnus M recently inherited some C# code which came from the offshore team, and it got principal-agented all over.
///
/// License Person CompanyName
///
private string ErrorMsg
{
get
{
return "It is not possible to connect to the license server at this time." + Environment.NewLine
+ Environment.NewLine +
"Please try again later or contact customer service for help at info@domain.com"
+ Process.Start("mailto:info@domain.com");
}
}
When I started reading this code, I just got annoyed at the Environment.NewLine
calls, and was thinking about how formatting an error message like this right in the code is such an awful code smell, but it's hardly a WTF- until I got to Process.Start("mailto:info@domain.com")
.
As the name implies, Process.Start
starts a process. It's normally used to execute external programs, but here we pass a URL to it. Since this software runs on Windows, it should trigger the OS to open the default mail program, if there's one assigned. If there isn't, your attempt to access the ErrorMsg
property just threw an unhandled exception.
There is no sensible reason why accessing a read-only property should launch a mail program. Even if "hey, just start a mail program when things go wrong," were an acceptable UX choice (spoilers: it isn't), this is so far away from "single responsibility principle" that it makes my head hurt.
Magnus adds:
There are many, many issues with the code, but I thought this snippet was a good representation of the general quality. … My favorite detail is probably the comment.
I don't know about you, but "License Person CompanyName" is actually the name I put on all my Git commits.
Метки: CodeSOD |
CodeSOD: Optimized |
In modern times, there's almost no reason to use Assembly, outside of highly specific and limited cases. For example, I recently worked on a project that uses a PRU, and while you can program that in C, I wanted to be able to count instructions so that I could get extremely precise timings to control LEDs.
In modern times, there's also no reason to use Delphi, but Andre found this code a few years ago, and has been puzzling over it ever since.
procedure tvAdd(var a,b:timevectortype; Range: Integer); register;
var
i: Integer;
pa,pb: PDouble;
begin
i:=succ(LongRec(Range).Lo-LongRec(Range).Hi);
pa:=@a[LongRec(Range).Hi];
pb:=@b[LongRec(Range).Hi];
asm
mov ecx, i
mov eax, [pa]
mov edx, [pb]
@loop:
fld qword ptr [eax]
fadd qword ptr [edx]
fstp qword ptr [eax]
add eax,8
add edx,8
dec ecx
jnz @loop
wait
end;
{ for i:=starts to ends do
a[i] := a[i] + b[i]; }
end;
The curly brackets at the end are a comment- they're telling us what the original Delphi code looked like, and it's pretty straightforward: loop across two lists, add them and store the result in the first list. The Assembly code was used to replace that to "boost performance". This code is as optimized as it can possibly be… if you ignore that it's not.
Now, at its core, the real problem is that we've replaced something fairly readable with something nigh incomprehensible for what is likely to be a very minor speedup. But this is actually worse: the assembly version is between 2-5 times slower.
The Assembly version also has a pretty serious bug. If i
, the length of the range we want to add across, is zero, we'll load that into the register ecx
. We'll still attempt to add values from lists a
and b
together, even though we probably shouldn't, and then we'll decrement the contents of ecx
. So now it's -1
. The jnz
, or "jump non-zero" will check that register, and since it's not zero, it'll pop back up to the @loop
label, and keep looping until ecx
wraps all the way around and eventually hits zero again.
Talk about a buffer overrun.
Now, as it turns out, playing with the Range
object did turn out to be kind of expensive, so Andre did fix the code with an optimization: he used integers intsead.
procedure tvAdd(var a,b:timevectortype; afrom, ato: Integer); register;
var
i: Integer;
begin
for i := afrom to ato do
a[i] := a[i]+b[i];
end;
Метки: CodeSOD |
The Therac-25 Incident |
A few months ago, someone noted in the comments that they hadn't heard about the Therac-25 incident. I was surprised, and went off to do an informal survey of developers I know, only to discover that only about half of them knew what it was without searching for it.
I think it's important that everyone in our industry know about this incident, and upon digging into the details I was stunned by how much of a WTF there was.
Today's article is not fun, or funny. It describes incidents of death and maiming caused by faulty software engineering processes. If that's not what you want today, grab a random article from our archive, instead.
When you're strapping a patient to an electron gun capable of delivering a 25MeV particle beam, following procedure is vitally important. The technician operating the Therac-25 radiotherapy machine at the East Texas Cancer Center (ETCC) had been running this machine, and those like it, long enough that she had the routine down.
On March 21, 1986, the technician brought a patient into the treatment room. She checked their prescription, and positioned them onto the bed of the Therac-25. Above the patient was the end-point of the emitter, a turntable which allowed her to select what kind of beam the device would emit. First, she set the turntable to a simple optical laser mode, and used that to position the patient so that the beam struck a small section of his upper back, just to one side of his spine.
By Ajzh2074 - Own work, CC BY-SA 4.0, Link
With the patient in the correct position, she rotated the turntable again. There were two other positions. One would position an array of magnets between the beam and the patient; these would shape and aim the beam. The other placed a block of metal between the beam and the patient. When struck by a 25MeV beam of electrons, the metal would radiate X-rays.
This patient's prescription was for an electron beam, so she positioned the turntable and left the room. In the room next door, shielded from the radiation, was the control terminal. The technician started keying in the prescription to begin the treatment.
If things were exactly following the routine, she'd be able to communicate with the patient via an intercom, and monitor the patient via a video camera. Sadly, that system had broken down today. Still, this patient had already had a number of treatments, so they knew what to expect, so that communication was hardly necessary. In fact, the Therac-25 and all the supporting equipment were always finicky, so "something doesn't work" practically was part of the routine.
The technician had run this process so many times she started keying in the prescription. She'd become an extremely fast typist, at least on this device, and perhaps too fast. In the field for beam type, she accidentally keyed in "X", for "x-ray". It was a natural mistake, as most patients got x-ray treatments, and it wasn't much of a problem: the computer would see that the turntable was in the wrong position and refuse to dose the patient. She quickly tapped the "UP" arrow on the keyboard to return to the field, corrected the value to "E", for electron, and confirmed the other parameters.
Her finger hovered over the "B" key on the keyboard while she confirmed her data entry. Once she was sure everything was correct, she pressed "B" for "beam start". There was no noise, there never was, but after a moment, the terminal read: "Malfunction 54", and then "Treatment Pause".
Error codes were no surprise. The technicians kept a chart next to the console, which documented all the error codes. In this case, "Malfunction 54" meant a "dose input 2" error.
That may not have explained anything, but the technician was used to the error codes being cryptic. And this was a "treatment pause", which meant the next step was to resume treatment. According to the terminal, no radiation had been delivered yet, so she hit the "P" key to unpause the beam.
That's when she heard the screaming.
The patient had been through a number of these sessions already, and knew they shouldn't feel a thing. The first time the technician activated the beam, however, he felt a burning sensation, which he later described like "hot coffee" being poured on his back. Without any intercom to call for help, he started to get off the treatment table. He was still extricating himself, screaming for help, when the technician unpaused the beam, at which point he felt something like a massive electric shock.
That, at first, was the diagnosis. A malfunction in the machine must have delivered an electric shock. The patient was sent home, and the hospital physicist examined the Therac-25, confirming everything was in working order and there were no signs of trouble. It didn't seem like it would happen again.
The patient had been prescribed a dose of 180 rads as part of a six-week treatment program that would deliver 6,000 rads in total. According to the Therac-25, the patient had received an underdose, a fraction of that radiation. No one knew it yet, but the malfunction had actually delivered between 16,000 and 25,000 rads. The patient seemed fine, but in fact, they were already dead and no one knew it yet.
The ETCC incident was not the first, and sadly was not the last malfunction of the Therac-25 system. Between June 1985 and July 1987, there were six accidents involving the Therac-25, manufactured by Atomic Energy Canada Limited (AECL). Each was a severe radiation overdose, which resulted in serious injuries, maimings, and deaths.
As the first incidents started to appear, no one was entirely certain what was happening. Radiation poisoning is hard to diagnose, especially if you don't expect it. As with the ETCC incident, the machine reported an underdose despite overdosing the patient. Hospital physicists even contacted AECL when they suspected an overdose, only to be told such a thing was impossible.
A few weeks later, there was a second overdose at ETCC, and it was around that time that the FDA and the press started to get involved. Early on, there was a great deal of speculation about the cause. Of interest is this comment from the RISKS mailing list from 1986.
Here is my speculation of what happened: I suspect that the current in the electron beam is probably much greater in X-ray mode (because you want similar dose rates in both modes, and the production of X-rays is more indirect). So when you select X-rays, I'll bet the target drops into place and the beam current is boosted. I suspect in this case, the currentwas boosted before the target could move into position, and a very high current electron beam went into the patient.
How could this be allowed to happen? My guess is that the software people would not have considered it necessary to guard against this failure mode. Machine designers have traditionally used electromechanical interlocks to ensure safety. Computer control of therapy machines is a fairly recent development and is layered on top of, rather than substituting for, the old electromechanical mechanisms.
The Therac-25 was the first entirely software-controlled radiotherapy device. As that quote from Jacky above points out: most such systems use hardware interlocks to prevent the beam from firing when the targets are not properly configured. The Therac-25 did not.
The software included a number of key modules that ran on a PDP-11. First, there were separate processes for handling each key function of the system: user input, beam alignment, dosage tracking, etc. Each of these processes was implemented in PDP-11 Assembly. Governing these processes was a real-time OS, also implemented in Assembly. All of this software, from the individual processes to the OS itself, were the work of a single software developer.
AECL had every confidence in this software, though, because it wasn't new. The earliest versions of the software appeared on the Therac-6. Development started in 1972, and the software was adapted to the Therac-25 in 1976. The same core was also used on the Therac-20. Within AECL, the attitude was that the software must be safe because they'd been using it for so long.
In fact, when AECL performed their own internal safety analysis of the Therac-25 in 1983, they did so with the following assumptions:
1) Programming errors have been reduced by extensive testing on a hardware simulator, and under field conditions on teletherapy units. Any residual software errors are not included in the analysis. 2) Program software does not decay due to wear, fatigue, or reproduction errors. 3) Computer software errors are caused by faulty hardware components, and "soft" (random) errors induced by alpha particles or electromagnetic noise.
In other words: we've used the software for a long time and software always copies and deploys perfectly. So, any bugs we see would have to be transient bugs caused by radiation or hardware errors.
After the second incident at ETCC, the hospital physicist took the Therac-25 out of service and worked with the technician to replicate the steps that caused the overdose. It wasn't easy to trigger the "Malfunction 54" error message, especially when they were trying to methodically replicate the exact steps, because as it turned out, if you entered the data slowly, there were no problems.
To trigger the overdose, you needed to type quickly, the kind of speed that an experienced operator might have. The physicist practiced until he could replicate the error, then informed AECL. While he was taking measurements to see how large the overdoses were, AECL called back. They couldn't replicate the issue. "It works on my machine," essentially.
After being coached on the required speed, the AECL technicians went back to it, and confirmed that they could trigger an overdose. When the hospital physicist took measurements, they found roughly 4,000 rads in the overdose. AECL, doing similar tests, triggered overdoses of 25,000 rads. The reality is that, depending on the timing, the output was potentially random.
With that information, the root cause was easier to understand: there was a race condition. Specifically, when the technician mistyped "X" for x-ray, the computer would calculate out the beam activation sequence to deliver a high-energy beam to create x-rays. When the technician hit the "UP" arrow to correct their mistake, it should've forced a recalculation of that activation sequence—but if the user typed too quickly, the UI would update and the recalculation would never happen.
By the middle of 1986, the Food and Drug Administration (FDA) was involved, and demanded that AECL provide a Corrective Action Plan (CAP). What followed was a lengthy process of revisions as AECL would provide their CAP and the FDA would follow up with questions, resulting in new revisions to the CAP.
For example, the FDA reviewed the first CAP revision and noted that it was incomplete. Specifically, it did not include a test plan. AECL responded:
no single test plan and report exists for the software since both hardware and software were tested and exercised separately together for many years.
The FDA was not pleased with that, and after more back and forth, replied:
We also expressed our concern that you did not intend to perform the [test] protocol to future modifications to the software. We believe that rigorous testing must be performed each time a modification is made to ensure the modification does not adversely affect the safety of the system.
While AECL struggled to include complex tasks like testing in their CAP, they had released instructions that allowed for a temporary fix to prevent future incidents. Unfortunately, in January, 1987, there was another incident, caused by a different software bug.
In this bug, there was a variable shared by multiple processes, meant as a flag to decide whether or not the beam collimator in the turntable needs to be checked to ensure everything is in the correct position. If the value is non-zero, the check needs to be performed. If the value is zero, it does not. Unfortunately, the software would increment the field, and the field was only one byte wide. This meant every 256th increment, the variable would be zero when it should have been non-zero. If that incorrect zero lined up with an operator action, the beam would fire at full energy without the turntable in the right position.
AECL had a fix for that (stop incrementing and just set the value), and amended their CAP to include that fix. The FDA recognized that was probably going to fix the problem, but still had concerns. In an internal memo:
We are in the position of saying that the proposed CAP can reasonably be expected to correct the deficiencies for which they were developed (Tyler). We cannot say that we are [reasonably] confident about the safety of the entire system…
This back-and-forth continued through a number of CAP revisions. At each step in the process, the FDA found issues with testing. AECL's test process up to this point was simply to run the machine and note if anything went wrong. Since the software had been in use, in some version, for over a decade, they did not see any reason to test the software, and thus had no capacity or plan for actually testing the software when the FDA required it.
The FDA, reviewing some test results, noted:
Amazingly, the test data presented to show that the software changes to handle the edit problems in the Therac-25 are appropriate prove the opposite result. … I can only assume the fix is not right, or the data were entered incorrectly.
Eventually, the software was fixed. Legislative and regulatory changes were made to ensure incidents like this couldn't happen in the future, at least not the same way.
It's worth noting that there was one developer who wrote all of this code. They left AECL in 1986, and thankfully for them, no one has ever revealed their identity. And while it may be tempting to lay the blame at their feet—they made every technical choice, they coded every bug—it would be wildly unfair to do that.
With AECL's continued failure to explain how to test their device, it should be clear that the problem was a systemic one. It doesn't matter how good your software developer is; software quality doesn't appear because you have good developers. It's the end result of a process, and that process informs both your software development practices, but also your testing. Your management. Even your sales and servicing.
While the incidents at the ETCC finally drove changes, they weren't the first incidents. Hospital physicists had already reported problems to AECL. At least one patient had already initiated a lawsuit. But that information didn't propagate through the organization; no one put those pieces together to recognize that the device was faulty.
On this site, we joke a lot at the expense of the Paula Beans and Roys of this world. But no matter how incompetent, no matter how reckless, no matter how ignorant the antagonist of a TDWTF article may be, they're part of a system, and that system put them in that position.
Failures in IT are rarely individual failures. They are process failures. They are systemic failures. They are organizational failures. The story of AECL and the Therac-25 illustrates how badly organizational failures can end up.
AECL did not have a software process. They didn't view software as anything more than a component of a larger whole. In that kind of environment, working on safety critical systems, no developer could have been entirely successful. Given that this was a situation where lives were literally on the line, building a system that produced safe, quality software seems like it should have been a priority. It wasn't.
While the Therac-25 incident is ancient history, software has become even more important. While we would hope safety-critical software has rigorous processes, we know that isn't always true. The 737MAX is an infamous, recent example. But with the importance of software in the modern world, even more trivial software problems can get multiplied at scale. Whether it's machine learning reinforcing racism, social networks turning into cesspools of disinformation or poorly secured IoT devices turning into botnets, our software exists and interacts with the world, and has real world consequences.
If nothing else, I hope this article makes you think about the process you use to create software. Is the process built to produce quality? What obstacles to quality are there? Is quality a priority, and if not, why not? Does your process consider quality at scale? You may know your software's failure modes, but do you understand your organization's failure modes? Its blind spots? The assumptions it makes which may not be valid in all cases?
Let's return for a moment to the race condition that caused the ETCC incidents. This was caused by users hitting the up arrow too quickly, preventing the system from properly registering their edits. While the FDA CAP process was grinding along, AECL wanted to ensure that people could still use the Therac-25 safely, and that meant publishing quick fixes that users could apply to their devices.
This is the letter AECL sent out to address that bug:
SUBJECT: CHANGE IN OPERATING PROCEDURES FOR THE THERAC-25 LINEAR ACCELERATOR
Effective immediately, and until further notice, the key used for moving the cursor back through the prescription sequence (i.e., cursor "UP" inscribed with an upward pointing arrow) must not be used for editing or any other purpose.
To avoid accidental use of this key, the key cap must be removed and the switch contacts fixed in the open position with electrical tape or other insulating material.
For assistance with the latter you should contact your local AECL service representative.
Disabling this key means that if any prescription data entered is incorrect, than "R" reset command must be used and the whole prescription reentered.
For those users of the Multiport option, it also means that editing of dose rate, dose, and time will not be possible between ports.
On one hand, this is a simple instruction that would effectively prevent the ETCC incidents from reoccurring. On the other, it's terrifying to imagine a patient's life hanging on a ripped up keycap and electrical tape.
This article is intended as a brief summary of the incident. Most of the technical details in this article come from this detailed account of the Therac-25 incident. That is the definitive source on the subject, and I recommend reading the whole thing. It contains much more detail, including deeper dives into the software choices and organizational failures.
Метки: Feature Articles |
Error'd: Sweet Sweet Summertime |
Gastronome Carl hungrily drools "I haven't measured the speed of a snail but it's gotta be close. "
While Dan B. rats out Petco's dwindling discount
And William Blair wonders "Does this mean I'm actually UP 2%? "
Comics Fan Ken Mitchell seeks caped cave-cleaning customer support but "can't find NaN-aN-aN on my calendar!"
Yet at the end of the day, amateur meteorologist Esther lets us know "I noticed that summer will start early this year" because Brian needed a head start with Carl's dinner, Gary!
Метки: Error'd |
CodeSOD: Self Improvement in Stages |
Jake has a co-worker named "Eddie". Eddie is the kind of person who is always hoping to change and get better. They're gonna start eating healthier… after the holidays. They're gonna start doing test driven development… on the next project. They'll stop just copying and pasting code… someday.
At least, that's what we can get from this blob of code.
//TODO make this recursive, copy paste works for now though
if (website_description != null) {
if (website_description.length() > 25) {
int i = website_description.indexOf(" ", 20);
if (i != -1) {
String firstsplit = website_description.substring(0, i);
String secondsplit = website_description.substring(i);
websiteWrapped = firstsplit + "
" + secondsplit;
if (secondsplit.length() > 25){
int split_two = secondsplit.indexOf(" ", 20);
String part1 = secondsplit.substring(0, split_two);
String part2 = secondsplit.substring(split_two);
websiteWrapped = firstsplit + "
" + part1 + "
" + part2;
if (part2.length() > 25){
int split_three = part2.indexOf(" ", 20);
String part3 = part2.substring(0, split_three);
String part4 = part2.substring(split_three);
websiteWrapped = firstsplit + "
" + part1 + "
" + part3 + "
" + part4;
if (part4.length() > 25){
int split_four = part4.indexOf(" ", 20);
String part5 = part4.substring(0, split_four);
String part6 = part4.substring(split_four);
websiteWrapped = firstsplit + "
" + part1 + "
" + part3 + "
" + part5+ "
" + part6;
if (part6.length() > 25){
int split_five = part6.indexOf(" ", 20);
String part7 = part6.substring(0, split_five);
String part8 = part6.substring(split_five);
websiteWrapped = firstsplit + "
" + part1 + "
" + part3 + "
" + part5+ "
" + part7+ "
" + part8;
if (part8.length() > 25){
int split_six = part8.indexOf(" ", 20);
String part9 = part8.substring(0, split_six);
String part10 = part8.substring(split_six);
websiteWrapped = firstsplit + "
" + part1 + "
" + part3 + "
" + part5+ "
" + part7+ "
" + part9+ "
" + part10;
if (part10.length() > 25){
int split_seven = part10.indexOf(" ", 20);
String part11 = part10.substring(0, split_seven);
String part12 = part10.substring(split_seven);
websiteWrapped = firstsplit + "
" + part1 + "
" + part3 + "
" + part5+ "
" + part7+ "
" + part9+ "
" + part11+ "
"+ part12;
}
}
}
}
}
}
} else {
websiteWrapped = website_description;
}
} else {
websiteWrapped = website_description;
}
}
It is, of course, the comment which makes this sample: //TODO make this recursive, copy paste works for now though
. But I would argue that recursion wouldn't actually help that much, not if we're gonna keep building every string via string concatenation.
I'm sure the comment is accurate: it works for now. I'm afraid, though, that it's probably going to keep working like this for a much, much longer period of time.
Метки: CodeSOD |
CodeSOD: Stocking Up |
Sometimes, you find some code that almost works, that almost makes sense. In a way, that's worse than just plain bad code. Ren'e was recently going through some legacy JavaScript code for their warehouse management system.
Like any such warehousing system, there's a problem you have to solve: sometimes, the number of units you need to pick to complete the order is larger than the stock you have available. At that point, you need to make a decision: do you hold the order until stock comes in, do you partially fill it and then follow up with a second shipment, or do you perhaps just cancel the order?
Ren'e found a line like this:
pick.qty_to_pick -= pick.qty_to_pick - stock.available
So, if I want to pick 100 units, but only have 25 in stock, I'll decrement the qty_to_pick
by 75. Which vaguely makes sense, but also is a weird and awkward way of saying "make the qty_to_pick
equal to the stock.available
".
I assume there are guards around this line which make sure it's executed only if the qty_to_pick
is greater than the stock.available
. At least, I hope so, because if not, some customers are going to be surprised by the quantity when their order arrives.
In the end, this code isn't strictly wrong, it's just the weirdest most awkward way of copying one value to another variable.
Метки: CodeSOD |
The Economic Problem |
One of the main tasks any company needs to do is allocate resources. Regardless of the product or the industry they're in, they have to decide how to employ the assets they have to make money. No one has really "solved" this problem, and that's why there are swarms of resource planning systems, project management tools, and cultish trend-following.
After a C-suite shuffle at James B's employer, one of the newly installed C-level execs had some big ideas. They were strongly influenced by one of the two life-changing books, and not the one involving orcs. A company needs to allocate resources. The economy, as a whole, needs to allocate resources. If, on the economic level, we use markets to allocate resources because they're more efficient than planning, then we should use markets internally as well.
For the most part, and for most groups in the company, this was just a book-keeping change. Everyone kept doing the same thing, but now instead of each department getting email accounts for every employee, each department got a pile of money, and used that to pay for email accounts for each employee. Instead of just getting a computer as part of the hiring process, departments "rented" a computer from IT. It created a surprising amount of paperwork for the supposedly "efficient" market, but at least at first, it wasn't a problem.
Before long, though, the C-suite started to notice that a lot of money flowed in to the IT department, but very little flowed back out. The obvious solution, then, was to cut the IT budget entirely. It would fund itself using the internal market, selling its services to other departments in the company.
The head of IT reacted in a vaguely reasonable way: they jacked the internal billing rates as high as they could. Since they technically owned the PCs, they installed them with physical locks on the cases. If you wanted a hard drive replacement, you needed to go through IT. The problem is that IT had exclusive contracts with vendors, and those vendor SLAs were pretty generous- to the vendors. One HDD failure could take a PC down for weeks while you waited for a replacement.
James was a victim of one such incident. While using a loaner PC to do his work, he and his boss Krista, got to talking about how frustrating this was. They were, after all, a software development team, and "having access to a computer, with all our software installed" was a priority.
"It makes me want to break the lock and replace the drive myself," James said. "It'd probably be cheaper too."
Krista laughed. "It'd be a lot cheaper. Heck, I could just buy you a new computer for what they charge to replace a hard drive."
Krista paused, then started mentally running the numbers. "Actually… I could do that." She immediately called a local vendor, a small company, and ordered a laptop for James. It arrived the next day, and once James set it up with his network credentials, he had full access to all the other IT services, like the shared drives.
Krista's team was one of the smaller teams in the company, but they needed a lot of IT services. Billed at the internal billing rates, that was a significant amount of money, and a big chunk of IT's budget came straight from Krista. But if she shopped around on her own, she could get everything- hardware, software licenses, basically everything but company email addresses and login credentials, for a fraction of the price.
And that's exactly what Krista did. She went through her department and found every piece of hardware they "leased" from IT, from PCs to network switches to even the cables, and replaced them.
The IT department wasn't happy about this. Most of their monthly spend was overhead that didn't change just because one tiny department stopped using their services. With Krista's team cutting off their funding, this meant IT had a budget crunch. Worse, other teams were starting to grumble.
This lead to a call where the head of IT laid out an ultimatum to Krista: "If you don't purchase your infrastructure from us, we will cut off your team's access to the network entirely. You can't just be plugging in any device you like to the network, it's bad for security."
"That's fine," Krista replied. "We can work on our own private LAN, and when we need to give software releases to the distribution team, we'll just walk down the hall and drop off a thumb drive or a CD, instead of using the network drive."
"You can't do that!"
"Why not? You're trying to bill me six figures a year to deliver a service I can replace with a short walk down the hall."
While the war between Krista and IT raged, elsewhere in the company, similar battles played out. Krista may have fired the first shot, but the internal market became a war zone.
The division which made Product Line A had no interest in selling Product Line B, despite the products being complimentary; their budget only made money when they sold A. Other departments tried to internalize other corporate functions- one department tried to spin up its own HR department, another stopped doing its primary job and just started selling accounting services to other departments. One of their hardware departments discovered that they could shift to reselling competitors products and make more money that way, so they did.
Within a year, the internal market was canceled. The C-level executive who had pushed for it had already moved on to another C-suite in another giant company, and was still preaching the gospel of the internal market. Without that influence, James's company instituted a new "Company Family" policy, which promised "no departmental boundaries". People still used internal budgeting to help them allocate resources, but gone were the big piles of money that could just be spent however. No department was trying to make money off other departments. The grand experiment in internal capitalism was over.
Метки: Feature Articles |
News Roundup: Flash Point |
With nearly one month of 2021 in the books and the spectre of Covid-19 exhausting all of us, let’s do a quick inventory of the memorable moments of the past three months, shall we?
Wait...what was #3!?!? No it can’t be! Cue Elton John’s ‘Candle in the Wind’. RIP Flash.
As a child of the 90’s, I vividly remember merging onto the information superhighway and spending hours playing games on Newgrounds. And what technology made it possible to play audio and video in-browser back then? Flash.
The sheer numbers of games available, combined with my bad memory, makes it impossible for me to remember the names of any of these particular games, but check out the first Flash game ever created, a zombie game called AEvil. (For a full list of games click here.)
Flash’s ability to bring much needed interactivity to websites allowed it to stick around much longer than anyone could have predicted. (In fact YouTube used the technology in its first iteration back in 2005.) Eventually Flash developers just could not keep up with the demands of a rapidly evolving internet; security vulnerabilities, browser speed reduction, and mobile web issues eventually caught up to it.
To make matters worse In 2007, Flash’s mobile incompatibility forced YouTube to abandon the technology in order to be included with the launch of the iPhone. Steve Jobs may have put the nail in the coffin for good in 2010 with his (in)famous ‘Thoughts on Flash’ presentation. The rest is history; now we have HTML5 to fill the gap (along with CSS and Javascript) that Flash left behind.
I did my own digging, and while interest in Flash has fallen since the late 2000’s, I think the real “flash point” (if you will) occurs around the summer of 2015 when the CW show, ‘The Flash’ overtook ‘Adobe Flash’ in Google search interest. (Sadly Flash Gordon and its amazing theme song have never really attracted much interest since 2004, as far as Google’s search trend data goes.)
But fear not. There are still ways to scratch that itch of ‘90s and ‘00s Flash game nostalgia. And we will be left with years of IT hilarity.
Like the story from Dailan, China where a 20-hour battle was waged at the local train station to revert a Flash update to get their systems back up and running, all for locals to follow along via WeChat. The story has ups and downs, from when the team noticed something was wrong:
1411 hours. The station is back in crisis. Once again, we cannot use the printer.”
To when the team identified the source of the problem:
“0816 hours: After calls and online searches, we confirmed the source of the issue is American company Adobe’s comprehensive ban of Flash content.”
To when they banded together to slay their common enemy:
“Jan. 13, 0113 hours: ‘Wan Jia Ling station is fixed! Ling Ma shouted…we all gathered and confirmed. The room burst with cheers and applause.”
What a ride. The best part is that they installed a pirated version of Flash to solve the problem.
Or how about the South African tax office having to build a custom web browser with Flash built-in in order for people to be able to file their taxes. If you fail to plan, you plan to fail people!
Enough of the hilarity; I think Mike Davidson objectively positions Flash the best in his obituary to the technology:
Flash, from the very beginning, was a transitional technology. It was a language that compiled into a binary executable. This made it consistent and performant, but was in conflict with how most of the web works. It was designed for a desktop world which wasn’t compatible with the emerging mobile web. Perhaps most importantly, it was developed by a single company. This allowed it to evolve more quickly for awhile, but goes against the very spirit of the entire internet. Long-term, we never want single companies — no matter who they may be — controlling the very building blocks of the web. The internet is a marketplace of technologies loosely tied together, each living and dying in rhythm with the utility it provides.
Most technology is transitional if your window is long enough. Cassette tapes showed us that taking our music with us was possible. Tapes served their purpose until compact discs and then MP3s came along. Then they took their rightful place in history alongside other evolutionary technologies. Flash showed us where we could go, without ever promising that it would be the long-term solution once we got there.
So here lies Flash. Granddaddy of the rich, interactive internet. Inspiration for tens of thousands of careers in design and gaming. Loved by fans, reviled by enemies, but forever remembered for pushing us further down this windy road of interactive design, lighting the path for generations to come.
Метки: News Roundup |
Error'd: We're Number 0th |
Drinker Philip B. confesses "The first bottle went down fine but after the second my speech got a little schlurred ..."
"Fortunately, we can rely on this detection originally developed to fight the plague!" advises an anonymous time traveler
Joel G. asks "I wonder if the $1003.99 fee is a maths error, a shipping error, or my postcode is a variable in calculating it? "
Polyglot Chris humblebrags "Duolingo thinks I'm better than number 1 in the league, I'm number 0!"
Airplane enthusiast Michael P. is sure he can make this toy fly for half the price. We must admit the gag went over our heads at first.
Метки: Error'd |
Coming to Grips |
Regardless of what industry you're in, every startup hits that dangerous phase where you're nearing the end of your runway but you still haven't gotten to the point where you can actually make money with your product. The cash crunch starts, and what happens next can often make or break the company.
Nathan was working for a biotech company that had hit that phase. They had a product, but they couldn't produce enough of it, cheaply enough, to actually make a profit. What they needed was some automation, and laboratory robots were the solution. But laboratory robots were expensive, and for a company facing a cash crunch, "expensive" was too risky. They needed a cheaper solution.
So they found Roy. Roy was an engineer and software developer, and Roy could build them a custom robot for much cheaper. After all, what was a lab robot but an arm with a few stepper motors and some control software? Roy turned around and shipped them a 2-axis robotic gripper arm ahead of schedule and under budget.
When Nathan joined the team, the arm wasn't working. Or, well, it kinda worked. Like a lot of such systems, the gripper had a "home" position. Between tasks, the gripper needed to return to that home position, and the way it was supposed to know that it was there was by checking a limit switch- a physical "button" that the arm would touch, telling the motor-control board that it should stop moving the arm.
For some reason, during the homing operation, the arm would stutter its way over, constantly stopping at random intervals, jerking and making godawful noises along the way. Nathan got Roy on the phone to talk through the symptoms and what was going on.
"So, when I was testing these," Roy said, "I found a fault in the motor-control boards. I think it's the whole batch of them, because every one I tried had the exact same behavior."
Nathan asked Roy to repeat that. "You think the entire lot of motor controllers you ordered from the vendor have QA failures?"
"It's the only explanation," Roy said. "It's certainly not anything in my software or any of my custom parts."
Nathan was almost certain that wasn't true.
Nathan examined the hardware while Roy continued his explanation. "So, anyway, what I was seeing was that the motor controller will report the home switch is hit, even when absolutely nothing is touching the home switch. So I added a work around; when the motor controller tells my software the switch is hit, I stop the motor, but then check to see if the switch is actually hit- and if it isn't, I keep moving."
"So, wait, when you stop the motor, the incorrect data goes away?"
"That's what I found in testing, yeah."
Nathan examined the wiring that connected the motor controller to the rest of the hardware- specifically the stepper motors and the limit switch. Roy had obviously wanted to keep his design "neat and clean", because he used a single, multi-conductor cable. From there, it wasn't hard for Nathan to figure out what was going on.
Some of the wires in that connector were just power and ground. One was for the limit switch- it'd read "high" if the switch were hit, and "low" otherwise. And the others were for the stepper motors. All of these wires were crammed together, with only a thin layer of insulation between them. The problem with that design was that stepper motors are controlled by sending PWM signals down the wire. Time-varying electrical fields have this seemingly magical power to induce current in other fields, which is a fancy way of saying "putting PWM signals on one wire can induce current in a nearby wire", and also is one of the basic principles which anyone doing electrical design should know.
When the homing operation told the steppers to move, the signal to the steppers created interference in the home switch wire, causing the motor controller to think the home switch had been hit. By stopping the steppers, Roy stopped the interference, so the interference stopped, and now the system could continue.
The fix was also simple: Nathan replaced the single multi-conductor cable with two shielded cables, isolating the home switch wire from interference.
As a bonus example of what you get when you hire Roy, Nathan also supplied some of the sample code in Roy's custom robot scripting language. This sample promises to show you how to weigh every vial in a rack of vials:
For Vial = 1 to 96
Rack.MoveToCell(XY, Vial)
Gripper.PickUpVial()
Balance.Tare()
Balance.MoveToCell(XY, 1)
MyWeight = Balance.Weight()
PutVialBack(Vial)
Next
Roy calls this code "easy to understand and modify", but I'll let Nathan explain why this isn't true:
Why do the Rack and the Balance each have the ability to move the Gripper? Why is "PickUpVial()" a function on the gripper, but "PutBackVial()" is not part of a class? What's the logic and abstraction here? Much like the robot itself, this is code that looks easy to understand but in practice is hard to operate.
As for what happened to the startup that was so focused on cutting costs they were buying equipment from Roy? Well, Nathan doesn't say, but whether they survived or failed, it's still not really a happy ending either way, is it?
Метки: Feature Articles |
CodeSOD: Don't Do This |
Let's say you were writing a type checker in TypeScript. At some point, you would find that you need to iterate across various lists of things, like for example, the list of arguments to a function.
Now, JavaScript (and thus TypeScript) gives you plenty of options for building the right loop for your specific problem. Or, if you look at the code our anonymous submitter sent, you could just choose the wrongest one.
var i = 0;
var func1: string[] = func_e.get(expr.name);
if (func1.length != 0) {
do {
var a = typeCheckExpr(expr.arguments[i], env);
if (a != func1[i]) throw new Error("Type mismatch in parameters");
i += 1;
} while (i != expr.arguments.length);
}
Now, it's tricky to reconstruct what the intent of the code is with these wonderfully vague variable names, but I'm fairly certain that func1
contains information about the definition of the function, while expr
is the expression that we're type checking. So, if the definition of func1
doesn't contain any parameters, there's nothing to type-check, we skip the loop. Then, we use a do
loop, because that if
tells us we have at least one argument.
In the loop, we check the passed in argument against the function definition, and chuck an error if they don't match. Increment the counter, and then keep looping while there are more passed in arguments.
Our submitter claims "There's nothing strictly wrong about this snippet, and it all runs correctly," which may be true- I don't know enough about how this code is used, but I suspect that it's going to have weird and unexpected behaviors depending on the inputs, especially if the idea of "optional parameters" exists in the language they're type-checking (presumably TypeScript?).
But bugs aside, the core logic is: if the function takes parameters, iterate across the list of arguments and confirm they match the type. The do
loop just confuses that logic, when the whole thing could be a much simpler for
loop. As our submitter says, it's not wrong, but boy is it annoying. Annoying to read, annoying to parse, annoying because it should be a pretty simple block of code, but someone went and made it hard.
Метки: CodeSOD |
Not-so-Portable Document Format |
Adrian worked for a document services company. Among other things, they provided high-speed printing services to clients in the financial services industry. This means providing on site service, which is how Adrian ended up with an office in the sub-sub-basement of a finance company. Adrian's boss, Lester, was too busy "developing high-end printing solutions on a Unix system" to spend any time in that sub-sub-basement, and instead embedded himself with the client's IT team.
"It's important that I'm working closely with them," Lester explained, "because it's the only way we can guarantee true inter-system compatibility." With disgust, he added, "They're mostly a Windows shop, and don't understand Unix systems, which is what drives our high-speed printing solution."
It was unclear to Adrian whether Lester was more interested in "working closely" or "getting access to the executive breakroom with free espressos", but that's what Lester got, while Adrian made do with a Mr. Coffee from 1987, while fielding emails from users trying to understand why their prints didn't work.
Bobbi was one such user. She was fairly technical, and had prepared some complex financial reports for printing. Because she was very aware how she wanted these reports to look, she'd gone the extra step and made them as a PDF. She'd sent it over to the high-speed-printer and it got kicked back with an error about invalid files. Adrian reviewed her PDFs, couldn't see any errors or problems, tried submitting the job himself, and a few minutes later it got kicked back.
Eventually, he called Lester.
"Hey, I've got a user trying to send some files over to the high-speed printer, and it doesn't seem like it'll take them."
"Oh, is that where all these PDFs have been coming from?"
"Uh… yes?"
Lester sighed. "See, this is why I need to be embedded with the team, they're so Windows biased, and now it's even infecting you."
"Hunh?"
Adrian somehow could hear Lester rolling his eyes over the phone. "The high speed printer is a Unix system, you know this."
"I do know that," Adrian confirmed, still mystified.
"PDFs are only good for the Windows operating system," Lester said. "It's not going to print properly on a Unix operating system."
"Our… high speed printer can't print PDFs?"
"If your users want to print PDFs, they need to print on their Windows-based printers."
"I just want to confirm," Adrian said, "again, our printer can't handle PDFs, the most common print format in the world, which is 100% supported by CUPS, and probably supported directly by the printer itself?"
"Adrian, this is why you're down in the sub-sub-basement doing support, you have a lot to learn about cross-platform interoperability."
Adrian related this information to Bobbi, and worked with her to convert the files into one of the "Unix-friendly" file formats Lester approved. After that, though, he did his own digging, and tried to understand why PDFs were forbidden.
It didn't take long. Lester handled all the print jobs through a set of homebrew shell scripts. Their main job was to prepend a banner page for the print job, but they also handled details about copying files, managing the queue, and had grown into a gigantic, unmanageable mess. It wasn't that Unix couldn't print PDFs, it was that Lester couldn't hack his already hacked scripts any further to support the Portable Document Format, and thus their high-speed print system couldn't handle the standard Unix printing format of PDFs.
Adrian eventually left that job. Lester, however, was still there, and so were his scripts.
https://thedailywtf.com/articles/not-so-portable-document-format
Метки: Feature Articles |
CodeSOD: Null and Terminated |
There's plenty of room for debate about what specific poor choices in history lead to the most bugs today. Was it the billion dollar mistake of allowing null pointers? Is it the absolute mess that is C memory management? Or is it C-style strings and all the attendant functions and buffer-overruns they entail?
A developer at Jay's company had been porting some C++ code to a new platform. That developer left, and the wheel-of-you-own-this-now spun and landed on Jay. The code was messy, but mostly functional. Jay was able to get it building, running, and then added a new feature. It was during testing that Jay noticed that some fields in the UI weren't being populated.
Jay broke out a memory analyzer tool, and it popped out warnings on lines where strlcpy
was being called. Now that was odd, as strlcpy
is the "good" way to copy strings, with guarantees that it would never allow buffer overruns. The buffers were all correctly sized, which left Jay wondering what exactly was wrong with the calls to strlcpy
?
A quick grep
through the code later, and Jay knew exactly what was wrong:
#define strlcpy strncpy
The code originally had been targeting a platform which had strlcpy
available, but the port was moving to a platform which did not. The previous developer, either out of a combination of laziness, ignorance, carelessness, or some combination of all of those, decided that since strlcpy
and strncpy
had the same calling semantics, a macro could solve all their problems.
If you haven't had to deal with C-strings, or just general C-style conventions, recently, it's important to note a few things. First, C doesn't actually have strings as a datatype, it just has an array of characters. Second, arrays are actually just pointers to the first item in the array, and C doesn't do anything to enforce the length, which means you're free to access element 11 in a 10 element array, and C will let you. Finally, since "knowing how long a string is" might actually be important, the way C-strings address the problems above is that the last character in the string should be a null terminator. All the string handling functions know that if they see a null terminator, that's the end of the string, and that keeps your code from reading off the end of the array into some other block of memory- or worse, writing to that arbitrary block of memory.
Which brings us to the key difference between strlcpy
and strncpy
: the first one is "safer" and guarantees that the last character in the output buffer is going to be a null terminator. strncpy
makes no such guarantee; if there isn't room in the buffer for a null terminator, it just doesn't put one in.
In other words, with one macro, Jay's predecessor had created hundreds of buffer-overrun vulnerabilities. Jay removed the macro, properly updated the calls to safely copy strings, and the errors went away.
In any case, let's close with this quote, from the "Bugs" section of the strncpy
/strcpy
manpage, which is just a fun read:
If the destination string of a strcpy() is not large enough, then anything might happen. Overflowing fixed-length string buffers is a favorite cracker technique for taking complete control of the machine. Any time a program reads or copies data into a buffer, the program first needs to check that there's enough space. This may be unnecessary if you can show that overflow is impossible, but be careful: programs can get changed over time, in ways that may make the impossible possible.
Метки: CodeSOD |
Error'd: The Journey is the Destination |
"As if my Uber ride wasn't expensive enough on its own, apparently I have to go sightseeing East for a little while first," writes Pascal.
Mike S. wrote, "Rumors have it that Apple will release a newer, sleeker, better circle next month."
"Oh cool! Future me is playing AdVenture Capitalist on my Google account," Luke writes.
"While everybody is complaining about extreme shipping delays, my great grandmother is getting my Etsy order just in time for New Year's," wrote Gord S.
Kevin O. writes, "Not sure whether this is supposed to be joke or is PHP really crashing its Wikipedia page."
https://thedailywtf.com/articles/the-journey-is-the-destination
Метки: Error'd |
CodeSOD: Revenge of the Stream |
It's weird to call Java's streams a "new" feature at this point, but given Java's prevalence in the "enterprise" space, it's not surprising that people are still learning how to incorporate them into their software. We've seen bad uses of streams before, notably thanks to Frenk, and his disciple Grenk.
Well, one of Antonio's other co-workers "learned" their lesson from Frenk and Grenk. Well, they learned a lesson, anyway. That lesson was "don't, under any circumstances, use streams".
Unfortunately, they were so against streams, they also forgot about basic things, like how lists and for-loops work, and created this:
private List createListOfDays(String monthAndYear)
{
List daysToRet = new ArrayList<>();
Integer daysInMonth = DateUtils.daysInMonth(monthAndYear);
for (Integer i = 1; i <= daysInMonth; i++)
{
daysToRet.add(i);
}
Set dedupeCustomers = new LinkedHashSet<>(daysToRet);
daysToRet.clear();
daysToRet.addAll(dedupeCustomers);
Collections.sort(daysToRet);
return daysToRet;
}
The goal of this method is, given a month, it returns a list of every day in that month, from 1…31, or whatever is appropriate. The date-handling is all taken care of by daysInMonth
, which means this is the rare date-handling code where the date-handling isn't the WTF.
No, the goal here is simply to populate an array with numbers in order, which the for-loop handles perfectly well. It's there, it's done, it's an entirely acceptable solution. Just return
right after the for loop, and there's no problem at all with this code. You could just stop.
But no, we need to dedupeCustomers
, which oh no, they just copied and pasted this code from somewhere else. In this case, to remove duplicates, they use a Set
, or specifically a LinkedHashSet
, which is one of the many set implementations Java offers as a built-in. A hash set, doesn't retain order, in contrast to something like a TreeSet
, which does.
I bring that ordering thing up, because we started with our list in sorted order, with no duplicates. We added it to a set, destroying the order and removing the duplicates we don't have. Then, we clear the original list, jam the unsorted data back into it, and then have to sort it again.
This code made Antonio angry, and dealing with Frenk's unholy streams also made him angry, so Antonio decided to not only fix this method, but use it to demonstrate a stream one-liner which wasn't a disaster:
private List createListOfDays(String monthAndYear)
{
return IntStream.rangeClosed(1, DateUtils.daysInMonth(monthAndYear)).collect(ArrayList::new, List::add, List::addAll);
}
Метки: CodeSOD |
Just Google It |
Based on the glowing recommendations of a friend, Philip accepted a new job in a new city. The new city was a wonderful change of pace. The new job, on the other hand…
The company was a startup, "running lean" and "making the best use of our runway". The CEO was a distant figure, but the CTO, Trey, was a much more hands on "leader". Trey was part of the interview process, and was the final decision maker for hiring Philip. On Philip's first day, Trey commented on the fact that Philip specifically hadn't gotten a degree in software engineering, but had twenty years of work experience.
"Honestly, that really put you over the top," Trey said. He grinned the smile of someone who has spent a lot of money engineering the perfect smile, and clapped Philip on the back. "We tend to prefer candidates from, y'know, 'non-traditional' backgrounds. I mean, I have a degree in logic!"
Philip nodded awkwardly, not exactly sure what to make of that. Trey clapped him on the back again, and added. "But as you can see, I've made quite the career in tech. I think my background gives me a better perspective than someone who's been too focused on the bits and bytes, you know? Broader. Why, it's certainly helped me make connections. I know people at Google! Maybe I'll introduce you, if you promise not to go running off!" Trey laughed at his own joke.
Or maybe it wasn't a joke. Philip's first few months at the company were mostly meetings. Some of those meetings were about company processes and standards, which Trey had copied from what he heard Google did. Sometimes, the meetings were more like propaganda sessions, focusing on how much this company was like Google, and would one day be as successful as Google, and how lucky you were to get in on the ground floor of the next Google.
A number of the meetings focused on security, and these meetings generally had a darker, more threatening tone. "Our intellectual property," Trey explained, "belongs to our investors. We must protect it at all costs."
Once Philip was properly indoctrinated into the company cult, and fully warned about security concerns, he was given access to the company's private Gitlab, hosted on the Google Cloud Platform. The first thing Philip noticed was that the installed version of Gitlab was from 2015, and there were a number of documented vulnerabilities that had since been patched in newer versions.
So much for security.
Their product was one gigantic Eclipse project. Emphasis on Eclipse. There were no automated builds. If you wanted to build, you used Eclipse. There were no automated deployments. If you wanted to deploy, you used Eclipse. Philip ran some automated analysis tools against the codebase, just to help him get a sense of what he was looking at.
About 45% of the code was duplicated code from elsewhere in the codebase. One letter variable names were apparently the standard. There was no testing code, whatsoever. And every single third-party library was included in source control, creating a git repo that was over 2GB in size.
Philip wasn't given much direction on what he should work on next. "We want to let smart people do smart things," Trey said, "like at Google. You set direction, and keep me in the loop so I can course correct."
Philip decided that the first thing they needed was some automation on the builds. Then they could move up to continuous integration. It'd also be nice to get dependency management cleaned up so instead of tracking all of your dependencies in git, you could have your build/deploy system handle that.
"So," Philip said to Trey after explaining this, "I thought I'd get started on building with Maven. That's pretty much the standard tool for Java projects like this, I'm already familiar with it, the rest of the team is too-"
"I don't think that's what they use at Google," Trey said.
"Well, I mean…"
"No, no, let me make a few calls. I know people at Google."
While Trey went ahead and made his calls, Philip got to work building a Maven build anyway. It turned out to be more complicated than he expected at first, especially because the code depended on some deprecated Google Cloud Platform libraries, which meant Philip had to also modernize some of that as part of the process, which meant starting on writing some unit tests to avoid introducing regressions, and it sorta turned into a yak shaving expedition.
"Hey," Trey said, "at Google they use Bazel, so we should use that."
"Um, I mean, none of us are really familiar, and Maven is really fit for purpo-"
"At Google," Trey repeated, "they use Bazel."
Philip and the rest of the team gave Bazel a fair shot, and spent the better part of a week trying to get their project configured in a way that worked with Bazel or configure Bazel in a way that worked with their project, and the end result was confused, angry and frustrated developers.
"I understand that this is what Google uses, and it might be a great fit for them, but it's really not a great fit for this project or this team," Philip explained to Trey. "I think we can get to a Maven version in just another day or two, and it'll give us all the benefits we want."
"Well, I think we should do what Google does, but we have another problem," Trey said. "It seems that you merged some code to master."
"Uh, yeah, just some unit tests, and the team reviewed them."
"But I didn't review them," Trey said. "So I'm going to have to call a code freeze until I get a chance to understand the changes you made. No more changes until I've done that."
"Do they do code freezes at Google?" Philip wondered.
"Of course they do."
That was the last time Philip saw the CTO in person. Instead of being busy studying the code, however, Trey was busy gladhanding with investors, showing up for photo-ops with trade mags, and scheduling media appearances to talk up how this was the next Google.
In the end, the code freeze lasted five months, which was longer than Philip lasted. He found another job. The startup is still running, however, but Trey is no longer the CTO. He's now the CEO, and in charge of the entire operation.
Метки: Feature Articles |
CodeSOD: Table This for a Moment |
Relational databases have very different constraints on how they structure and store data than most programming languages- tables and relationships don't map neatly to objects. They also have very different ways in which they can be altered. With software, you can just release a new version, but a database requires careful alterations lest you corrupt your data.
There are many ways to try and address that mismatch in our software, but sometimes the mismatch isn't in our software, it's in our brains.
Peter was going through an older database, finally migrating its schema definition into scripts that can be source-controlled. This particular database had never received any care from the database team, and instead all of the data modeling was done by developers. Developers who might not have been quite ready to design a database system from scratch.
The result was this:
mysql> DESC `preferences`;
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | YES | MUL | NULL | |
| time_format | varchar(255) | YES | | 24 | |
| date_format | varchar(255) | YES | | eu | |
| boolean_val1 | tinyint(1) | YES | MUL | 0 | |
| boolean_val2 | tinyint(1) | YES | MUL | 0 | |
| boolean_val3 | tinyint(1) | YES | MUL | 0 | |
| boolean_val4 | tinyint(1) | YES | MUL | 0 | |
| boolean_val5 | tinyint(1) | YES | MUL | 0 | |
| boolean_val6 | tinyint(1) | YES | MUL | 0 | |
| boolean_val7 | tinyint(1) | YES | MUL | 0 | |
| boolean_val8 | tinyint(1) | YES | MUL | 0 | |
| boolean_val9 | tinyint(1) | YES | MUL | 0 | |
+--------------+--------------+------+-----+---------+----------------+
So, let's start with the boolean_val*
fields. As you can see, they're helpfully named in a way that gives you absolutely no idea what they might be used for. Obviously, as this is the preferences
table, they should be storing some sort of preference. Well, at least one of them is- boolean_val2
has a mix of ones and zeroes. All the other boolean_val*
fields just store zeroes. Is that because they're unused? Or because all the users have left the corresponding preference at its default value? Nobody knows!
With that in mind, let's turn to time_format
and date_format
. If one were going "by the book" on normal forms, these should probably be foreign keys to a table which lists the options, but that means extra joins and boy, that just might be overkill if you only have a handful of options.
So it's not wrong that they store these as strings in the field. But it's worth noting that time_format
has only two allowed values- 12
and 24
. And date_format
also has only two allowed values - eu
and us
. So again, not wrong, but with a solid understanding that there are only two possible values for each field, and that big pile of boolean_val*
fields which may or may not be used, it's at least ironic that using booleans never occurred to them.
As a bonus, while time_format
only has two possible values, the schema permits it to be null. That means there are definitely nulls:
mysql> SELECT DISTINCT(`time_format`) FROM `preferences`;
+-------------+
| time_format |
+-------------+
| 24 |
| 12 |
| NULL |
+-------------+
There are definitely cases where it is null. Peter hadn't yet confirmed, but it's likely the front-end wasn't expecting nulls, and this accounts for a number of reported bugs in the UI.
Метки: CodeSOD |
Demo Most Dear |
Reese was driving home from work one day in 2012 when his cell phone rang out over his driving music. It wasn't a number he had stored in his contacts, but the area code and prefix were clearly from his office.
"Hey! This is Janet." An airy voice reverberated through his car's interior once he put the call on speaker. "I tried your extension first, but you didn't pick up. Anyway, we're waiting for you in the conference room!"
Janet the PM, Reese reminded himself. Having no idea what she was talking about, he frowned at the interminable line of cars ahead of him. "What?"
"Initrode wants to talk about integrating with our ERP. Remember?"
"Yes, I know. Now?" It was past normal business hours for most of the company.
"I sent out a meeting invite."
"I never got one."
"Well, why don't you go ahead and dial into the conference call?" Janet's cheer was undiminished.
"I can't do this while I'm driving!" Reese protested. He glanced at the dashboard clock, at the red traffic light glowing in the distance, then sighed. "Go ahead and get started. I'm not too far away, I'll turn around and be there soon."
"All right! We're in 4-B."
It took Reese 20 minutes to turn around and return to the suburban office park, annoyance smoldering in his chest the whole way. In the parking lot for his building, lights were cutting on as the sun approached the horizon. He turned off the engine, took a deep breath, then exited the car to hurry off to Conference Room 4-B.
While jogging through corridors and stairwells, Reese reminded himself about the potential client at hand. Initrode focused on point-of-sale systems like cash registers, and wanted help with pulling, consolidating, and reporting data from these machines on a daily basis. While the ERP offered by Reese's company wasn't state-of-the-art, it was more than qualified to handle this.
Reese finally reached the conference room door and pulled it open. The other meeting participants had staked claims around a speakerphone. Their heads all swiveled to stare at him as he dropped into a seat at the far end of the table.
"Great, our expert just stepped in!" Janet smilingly announced for the benefit of those on the phone. "Let me introduce you to Reese, he is our expert on client/server integration. Reese, we're talking with Ed from Initrode. Ed was hoping for a little more explanation about how their systems would communicate with ours."
"Sure." His heart still pounding from exertion, Reese struggled not to sound winded. "Uh, the standard way we handle integration is through a desktop client application that you would license from us and install on your machines. That application would communicate with our servers." They were working on a more modern REST interface, but as that was in its infancy, he couldn't bring it up.
There was a pause on the other side. "You mentioned a license? How much would that cost?"
"Three thousand," Janet replied.
"Three thousand?"
"Yes."
Another pause. "We were ... well, you said this application was the 'standard' method. Is there a non-standard method, then?"
Janet cast a pleading look toward Reese.
Reese nodded. "Well, we do have an OLE interface, but it's pretty old and unreliable. We've been phasing it out elsewhere—"
"How much would that cost?" Ed asked.
"I'm not sure," Janet replied. "I'd have to find out."
"I'd also have to do checking on my end to see whether it's even feasible," Reese cautioned. "I'll need more information on your current setup."
The meeting adjourned with everyone promising to forward information to everyone else. Over the next few days, Janet learned that using the OLE interface would be cheaper for Initrode, and once they heard that, that was all they cared about despite warnings of potential fragility and unreliability from Reese. Initrode then demanded an all-inclusive proof-of-concept before going forward with any formal sales or projects. This request smelled fishy to Reese, and he made his reservations known, but the powers that be insisted that he comply. Reese wound up producing a simple application in VBScript with an accompanying .NET library that did all the client/server heavy lifting. He also included a big dialog box that displayed each time the application was opened: "This is a demo application for testing purposes only."
Reese sent off the demo along with all the source code. Everything went quiet for a few months. Initrode seemed to disappear off the face of the Earth ... until, of course, the demo application they'd deployed into production and trained their staff to use began breaking down.
As Reese listened to Ed's frantic pleas on the phone, he had to bite his lip to keep from laughing. "I'm sorry, but that code was given to you as-is with no guarantees or support agreement. I need to escalate this to my boss."
When Reese went to his boss' office and explained the situation, she let out the laugh that he'd been forced to suppress. "Now we can muscle them into an actual project!"
Only that never happened. Initrode still refused to sign on the dotted line. They wanted bug fixes to the demo app, that was it. The changes they wanted were all minor tweaks they could've made themselves with the source code Reese had given them, but for some reason, they refused to touch it. Reese's initial development was billed as consulting. The frantic call from Ed was billed as consulting. Reese's bug fixes? Billed as consulting. Consulting rates were rather high, so high that it ended up costing three times as much as if Initrode had simply agreed to a project from the start.
Reese eventually left to pursue new opportunities at a different company. As far as he knew, Initrode's horribly expensive demo still lived on.
Метки: Feature Articles |
Error'd: Infinite NaN |
"For NaN easy payments of infinity dollars per month, this too can be YOURS!" Daniel B. writes.
"I really like the diverse colors offered by this vendor, though it's a shame they don't offer a diverse selection of translations," wrote Mathieu S.
"Oh great, according to the free Nectar loyalty point scheme I owe them over lb18!" Colin writes.
Joseph K. wrote, "No skipping this ad! Or maybe I'll be trapped forever? That's just how NaN works I guess..."
"Oh thank goodness! Here I was afraid that my coupon was expired, but I still have {days} days to use it," writes Matus.
Jacob Z. writes, "According to this financial planning calculator, it looks like I'll have a good long while before I can retire."
Метки: Error'd |
Failing the Test |
Like many dev teams, Rubi's team relies heavily on continuous integration. Their setup, like many others, relies on git hooks, and whenever someone pushes a commit to any branch, it automatically runs all the associated unit tests. Good code stays green, and any bugs are immediately revealed. Branches with failing tests cannot be merged into the main branch, which is all pretty reasonable.
Recently, Ruby pushed a commit on a branch up, and pretty much immediately realized that the tests were going to fail because she forgot to update a related code file. Even as she started to amend the commit, she waited for the CI server to cough up an error. And waited. And waited. And waited.
Now, for this particular repository, Rubi wasn't usually doing much development in it. She was helping with a big system upgrade, and so her first thought was that she must have made some other mistake. After all, it's not like automated CI would just get turned off, right?
Well, when she glanced at the YAML file which controlled their test runner:
test-service-integration:
stage: test
image: $RUNNER_BASE_IMAGE
needs:
- build-prod
before_script:
{.... other stuff removed}
allow_failure: true
tags:
- run_in_docker
only:
- main
allow_failure: true
meant that the CI server would let failing code pass, which generally wasn't a good idea. More than that, though, that rule applied to the main branch. In this configuration, tests were never run on other branches, and when they ran on the main branch, failures were allowed.
When she flipped the flag back to allow_failure: false
, the unit tests in the main branch failed catastrophically.
Fortunately, their configurations were also source controlled, so it wasn't hard to find out who was responsible. Roger, one of their junior developers. He had made that change ten months ago, and no one else had touched the file since. Rubi pinged him on Slack and started a video call.
"Roger, what is going on with this?"
"Oh, that," Roger said. "Well, my tests were failing, well, not my tests, but one of the tests that somebody else wrote."
"So… you just turned the tests off?" Rubi said.
Roger shrugged. "Well, I had a feature to deliver for that sprint."
"I can't believe Felicia was okay with this," Rubi said. Felicia was the tech-lead on Roger's team. "She had to see this when you submitted your pull request."
"My what?" Roger asked.
"Your pull request? Y'know, for Felicia to review your code before it gets merged into main?"
"Oh, I just do all my work right on the main branch. Is that not okay? Felicia said it was fine, and that she didn't have time to review every change."
Rubi rubbed her temples and sighed. "No, that is not okay. Just… don't push anything else until I've talked to Felicia."
Rubi ended that call, and prepped for another video call, where she and Felicia could have a long discussion about how team leads can help their junior developer actually be successful, how code reviews contribute to that, and why leaving the automated unit tests off for ten months was a terrible idea.
Метки: Feature Articles |