How Linux Saved A Fast Food Giant.
I am a Windows guy. I have always used Windows at home, work, school, everywhere with the exception of my phone (iPhone now Nexus One) and one Linux class at FIU. I have an A+ and MCTS in Windows Vista. Soon I will have my MCITP. I drink the kool-aid. But Linux saved me and the company I sub contract to, a large fast food giant, from near-total disaster. Last month McAfee posted a virus definition update that flagged SVCHOST.EXE as a virus. This is my story of what happened.
Windows needs svchost.exe to live. It’s like removing the transmission or engine of a car. Without it, each of those workstations, at best, had no networking capabilities or, even worse, were unable to boot entirely. You need svchost.exe, otherwise you are not going anywhere. On Wednesday morning, our help desk started logging a high volume of calls in which our drive-through computer was rebooting and not loading back into Windows XP properly. Can you imagine how every IT guy felt when they realized that every single one of their XP machines just decided to reboot and not load Windows XP properly? By Wednesday afternoon, McAfee had already handed us a fix that needed to be run on the system- a SUPER DAT file, as they call it. But how do you deploy a manual fix to nearly 700 restaurants spread across the continental US and Canada? Not to mention that none of these machines have keyboards, as they are touchscreen POS systems. So we needed a tech to deploy the fix manually. To every restaurant. Do you realize how many restaurants a giant fast food company has in the world? A lot. Even with carpet in our office, I heard people shitting bricks when we realized the magnitude of the problem and the possible ways to fix it. Thankfully the franchise restaurants do not use McAfee, so they were spared.
How did this get past McAfee QA? I can imagine how it happened, but it still blows my mind that there are no fail-safes in place at McAfee to prevent this. If there are fail-safes, how many are there and how did this issue get past all of them? Either way, McAfee has a lot to answer to. So how do you fix a problem in which it would take considerably amount of money and man-hours? Only one thing came to my mind.
The touchscreen is useless while the machine is still booting up because in the BIOS, no drivers have been loaded yet. How were we going to run the fix if we can’t boot into safe mode? Our team had three ideas before I was able to help out (I was working on another assignment that was time-critical). One of the team members was testing a manual solution with a restaurant manager on the phone, and came to a happy and successful solution. We had a confirmed working solution that would take about twenty minutes of time, but it worked.
Let’s see. 20 minutes x 700 or so restaurants / 60 minutes in an hour equals to 233.3333333333333 man-hours. Holy Balls! That’s just my department too, since we only cover the US and help Canada. To think about the amount of man-hours we would have to put in to fix all of these worldwide is just incomprehensible to me.
Using some of our clout, we contacted Microsoft to see if they had anything that could help us, and they gave us a WinPE image that we could customize to run a script that would automatically fix the situation. Great, but we have two problems. First of all, we can’t run Microsoft’s proprietary .wim on our existing PXE environment. Secondly, the file size? Huge! We would have to push out nearly a CD’s worth of data to a lot of restaurants that do not have a great data connection under the best of circumstances (thanks telco duopolies!). Some of our more rural restaurants had connections so bad, they were going slower than a 56k. We were at the mercy of a piss-poor national broadband system. We pretty much ended up having to save this as a last resort, because it was basically unfeasible to send that kind of data to all affected restaurants. Some restaurants had to resort to buying a USB thumb drive and driving to another restaurant to get the file. Their connections were so bad, it could not handle a 62 MB file. Just awful.
Our last choice was to re-image the POS system using our existing Ghostcast server infrastructure. While mostly automated, each one had to be kicked off manually in each restaurant and took nearly an hour. We wanted to avoid this at all costs just because of the huge amounts of hassle involved. So how does Linux get involved? My idea was to create a small, self-extracting, PXE-bootable Linux system, mount an existing shared folder on the server in the restaurant, mount the the workstation’s Windows partition with read/write access, delete the broken svchost.exe and the virus definition, copy over a working svchost.exe, and finally reboot the machine. Logically it could work and it meets all the criteria. Small, fast, and—most importantly—fully automated.
I looked at using Damn Small Linux which I had known about for a while, but sadly does not include video drivers for the POS systems we had. Should anything go wrong with the script, there was no way for us to know. After some searching via Google, I stumbled upon PLoP Linux with instructions on how to get it to work via PXE. After some hours of gnawing, scripting and fighting lack of sleep, I had a working proof of concept by 1 AM Thursday to show others that it is possible, and it clocks in at 70 MB uncompressed compared to the WinPE of 250 MB or more. I showed it to one of my superiors and he called in a few other people. I broke the POS system in our lab and showed them from start to finish how it worked. It was pretty much decided right then to use this method over the other three. After cleaning up the processes and adding several easter eggs, it was time to pilot it at 10 local restaurants. So while two team members went driving off all over Miami we wanted to compress the files as much as possible. Using WinZip, the total package actually grew to 71 MB. How does a compression program add a megabyte to the package? I have no idea. Off I was again to use my old friend 7 Zip and to create a self extracting zip of now 62 MB. Not bad. Now the real magic happens.
We have a scripting guy who is damn good. He created a script that would automatically distribute the self-extracting 7 zip file with PLoP Linux to all of our restaurants, open it, turn on Tftpd32 which then boots into PLoP running my script, and upon rebooting into XP would kill the Tftpd32 PXE server and check that the bad DAT file from McAfee was indeed gone. After that it would report a successful fix to our SQL reporting server, where we could run reports from. A lot of my co-workers were amazed at my creative approach to solving this problem, but I was just as amazed, if not more so, at our scripter’s ability to control Windows via VBScript, Powershell, and other tools.
Some final thoughts:
What does it say to you that you have to stumble around to achieve in Windows what can be easily done in Linux? What about vice versa for me with regards to how my co-worker skillfully scripted out an entire deployment and verification process when I had no idea how to even do so? I knew what needed to be done in theory, but no idea how to execute it. To me, this taught me that you should use the right tool for the right job. To seek out knowledge and try out different things. If I had not flirted with Linux during college, where would we be right now? A month later and god knows how much money, we probably would just be finishing fixing everything up. Instead, with other complications, we finished in just under a week due to issues not related to either my script or my co-worker’s VBScript. How come Microsoft’s tools don’t offer this kind of ease of use and flexibility? Or maybe they do and I have no idea they exist. Go on and experiment. It will be good for you.
I’ve since revived my old ASUS Z71v laptop with Ubuntu 10.4 Lucid Lynx for daily use. Now I know that if I ever need to fix an issue with Windows that deals with files, I can just use a Mini Linux.
Boot from network (PXE, DHCP, TFTP, Windows network share) – Windows Server
Disclaimer: Please keep in mind that all of this was done on the fly, with very little testing while being sleep deprived. If there are errors or bad practices, don’t say I didn’t warn you.
Here is the config I used for creating a self extracting 7 zip. Self Extract Config
;!@Install@!UTF-8!
RunProgram="hidcon:McAfeeFix.bat"
InstallPath="C:\\Documents and Settings\\Administrator\\Desktop"
ExtractTitle="Fixing McAfee's Mistake..."
GUIMode="1"
SelfDelete="1"
;!@InstallEnd@!
Here is my pxelinux.cfg/default file. default
default vesamenu.c32
prompt 0
timeout 1
menu background splash.png
menu title Welcome to McAfee fixer v1.0 super alpha power plus.
menu color border 37;40 #00000000 #00000000 none
menu color title 1;37;40 #00000000 #00000000 none
menu color tabmsg 40;37 #88888888 #00000000 none
menu color sel 1;37;42 #ffffffff #ff808080 none
menu color unsel 1;40;32 #ff00ff00 #00000000 none
label linux
menu label PLoP Linux
kernel bzimage
append initrd=initrfs.gz vga=1 smbmount=//SERVERIP/SHARENAME:USERNAME:PASSWORD
Here is what you should add to runme.sh so it automatically runs when PLoP loads up. runme.cfg
echo " Creating a NTFS mount point"
mkdir /mnt/windows
echo " Creating share mount point"
mkdir /mnt/plop
echo " Mounting NTFS"
mount -t ntfs-3g /dev/hda1 /mnt/windows
echo " Mounting plop share"
mount //SERVERADDRESS/SHARE /mnt/plop -o username=USERNAME,password=PASSWORD
echo " Copying the new update"
cp /mnt/plop/EXTRA.DAT /mnt/windows/Program\ Files/Common\ Files/McAfee/Engine
echo " Copying the working SVCHOST.EXE"
cp /mnt/plop/svchost.exe /mnt/windows/WINDOWS/system32
echo " Copying the Super Dat and batch file to run it."
cp /mnt/plop/SDAT5958_EM.exe /mnt/windows/Packages
cp /mnt/plop/sdatautorun.bat /mnt/windows/Packages
echo " Rebooting. Have a nice day."
shutdown -r now
McAfee fix.bat McAfeeFix
NET SHARE plop="C:\Documents and Settings\Administrator\Desktop\pre plop\tftpboot\ploplinux"
CMD /C START "" "C:\BroadBand\tftpd32.bat"
START cscript "C:\Documents and Settings\Administrator\Desktop\pre plop\tftpboot\ploplinux\McAfeeFix.vbs"
SUPER DAT Auto Run.bat sdatautorun
::To run the McAfee version of the fix SUPERDAT!
C:\rms\SDAT5958_EM.exe /SILENT /REBOOT
McAfeeFix.vbs McAfeeFix.vbs
'******************************
'
' Program: McAfeeFix.log
' Description: Loops until POS is online, then transfers the extra.dat to it.
' Version: 1.0
' Created: 04/21/2010
'**********************************************
'**********************************************
'=> These Constants Are Not Intrinsic to the Scripting Engine, So We Shall Define Them Here
Const FOR_READING = 1 ' For reading from an existing file
Const FOR_APPENDING = 8 ' For appending to an existing file when opening
'=> Application-Related Constants
Const FILE_LOG = "c:\out\McAfeeFix.log"
Const WRITE_NAPA_EVENT_SCRIPT = "C:\Packages\WriteNAPAEvent.vbs /Event:"
Const NAPA_EVENT_KEY = "MCAFEE_FIX"
Const MDSXMLPath = "C:\MICROS\Common\Etc\MDSHosts.xml"
Const FileName = "R:\Program Files\Common Files\McAfee\Engine\extra.dat"
'=> Global Scope Variables
Dim objLogFile
Dim mobjFSO
'=> Instantiate Global Scope Objects that are needed in every run of the script
Set mobjFSO = CreateObject("Scripting.FileSystemObject")
Set objLogFile = mobjFSO.CreateTextFile(FILE_LOG,FOR_APPENDING,True)
Set mobjWSHShell = CreateObject("WScript.Shell")
LogProcess("Starting Main")Call Main()
LogProcess("Ending Main")
objLogFile.Close
Set mobjFSO = Nothing**********************************************
**********************************************
'
' Purpose: This verifies the last modified date of the files
' listed in the File_Array array.
'
' Returns: N/A
'
**********************************************
**********************************************sub Main()
Dim blnIsconnected
Dim intCount
Dim strDriveLetter
Dim blnProcessCompleted
Dim strProcessNamestrProcessName = "tftpd32.exe"
strDriveLetter = ""
intCount = 1
blnIsconnected = FALSE
blnDriveConnected = FALSECALL CheckXML(strIp)
If strIp <> "" Then
Do While blnProcessCompleted = FALSE
Wscript.echo "Loop wait " & intCount
blnIsconnected = IsConnected(strIp)
If blnIsconnected = FALSE Then
WScript.Sleep 3000if intCount < 50 ThenintCount = intCount + 1ElseintCount = 0End IfElseIf IsRunning(strProcessName) = FALSE ThenCall MapDrive(strIp)If mobjFso.FileExists(FileName) ThenWscript.echo "Successfully copied fix to the POS."Call LogProcess("Successfully copied fix to the POS.")blnProcessCompleted = TRUEShell(WRITE_NAPA_EVENT_SCRIPT & NAPA_EVENT_KEY & "." & intCount & ".SUCCESS")ElseWscript.echo "Failed to copied fix to the POS."Shell(WRITE_NAPA_EVENT_SCRIPT & NAPA_EVENT_KEY & "." & intCount & ".FAILED")blnProcessCompleted = TRUEEnd IfCall DisconnectDrive()ElseDo While IsConnected(strIp)WScript.Sleep 3000Wscript.echo "TFTPD32.exe running - Loop count: " & intCountLoopCall KillProcess(strProcessName)End IfEnd IfLoopElseNAPA_EVENT_KEY = NAPA_EVENT_KEY & "HOSTFILE.FAILED"Call LogProcess("Did not find POS")End IfShell(WRITE_NAPA_EVENT_SCRIPT & NAPA_EVENT_KEY)End sub'***************************************************************'***************************************************************'' Purpose: Kills the process passed as a parameter.'' Returns: N/A''***************************************************************'***************************************************************Sub KillProcess(ByVal strProcessName)Dim strComputerDim objWMIServiceDim colProcessListDim objProcessOn Error Resume NextstrProcessName = UCase(strProcessName)strComputer = "."Set objWMIService = GetObject("winmgmts:{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")Set colProcessList = objWMIService.ExecQuery ("SELECT * FROM Win32_Process WHERE Name = '" & strProcessName & "'")For Each objProcess in colProcessListCall LogProcess("Process " & objProcess.Name & " will be terminated.")objProcess.Terminate()NextEnd Sub'***************************************************************'***************************************************************'' Purpose: Checks if a process is running'' Returns: TRUE if running, False if not''***************************************************************'***************************************************************Function IsRunning(ByVal strProcessName)Dim objWMIService, objProcess, colProcess, strComputer, blnIsRunningstrComputer = "."blnIsRunning = FALSESet objWMIService = GetObject("winmgmts:{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")Set colProcess = objWMIService.ExecQuery ("Select * from Win32_Process WHERE Name = '" & strProcessName & "'")For Each objProcess in colProcessIf objProcess.Name = strProcessName ThenblnIsRunning = TRUEEnd IfNextIf blnIsRunning = FALSE ThenIsRunning = FalseElseIsRunning= TrueEnd IfEnd Function'***************************************************************'***************************************************************'' Purpose: Checks if the IP passed is on the network'' Returns: N/A''***************************************************************'***************************************************************Function IsConnected(ByVal strIpAddress)Const OpenAsASCII = 0Const FailIfNotExist = 0Dim TempFile, ConnectFileTempFile = mobjFso.GetSpecialFolder(2).ShortPath & "\" & mobjFSO.GetTempNameShell("%comspec% /c ping.exe -n 2 -w 500 " & strIpAddress & ">" & TempFile)Set ConnectFile = mobjFso.OpenTextFile(TempFile, FOR_READING, FailIfNotExist, OpenAsASCII)WScript.Sleep 3000 if intCount < 50 Then intCount = intCount + 1 Else intCount = 0 End If Else
If IsRunning(strProcessName) = FALSE Then
Call MapDrive(strIp)
If mobjFso.FileExists(FileName) Then
Wscript.echo "Successfully copied fix to the POS." Call LogProcess("Successfully copied fix to the POS.") blnProcessCompleted = TRUE Shell(WRITE_NAPA_EVENT_SCRIPT & NAPA_EVENT_KEY & "." & intCount & ".SUCCESS")
Else Wscript.echo "Failed to copied fix to the POS." Shell(WRITE_NAPA_EVENT_SCRIPT & NAPA_EVENT_KEY & "." & intCount & ".FAILED") blnProcessCompleted = TRUE
End If
Call DisconnectDrive() Else
Do While IsConnected(strIp) WScript.Sleep 3000 Wscript.echo "TFTPD32.exe running - Loop count: " & intCount Loop
Call KillProcess(strProcessName) End If
End If
Loop ElseNAPA_EVENT_KEY = NAPA_EVENT_KEY & "HOSTFILE.FAILED"
Call LogProcess("Did not find POS")
End If
Shell(WRITE_NAPA_EVENT_SCRIPT & NAPA_EVENT_KEY)
End sub
'***************************************************************
'***************************************************************
' Purpose: Kills the process passed as a parameter. '
' Returns: N/A'
'***************************************************************
'***************************************************************Sub KillProcess(ByVal strProcessName)
Dim strComputer
Dim objWMIService
Dim colProcessList
Dim objProcessOn Error Resume Next
strProcessName = UCase(strProcessName)strComputer = "."
Set objWMIService = GetObject("winmgmts:{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")
Set colProcessList = objWMIService.ExecQuery ("SELECT * FROM Win32_Process WHERE Name = '" & strProcessName & "'")For Each objProcess in colProcessList
Call LogProcess("Process " & objProcess.Name & " will be terminated.")
objProcess.Terminate()
NextEnd Sub
'***************************************************************
'***************************************************************
'
' Purpose: Checks if a process is running
'
' Returns: TRUE if running, False if not
'
'***************************************************************
'***************************************************************Function IsRunning(ByVal strProcessName)
Dim objWMIService, objProcess, colProcess, strComputer, blnIsRunning
strComputer = "."
blnIsRunning = FALSESet objWMIService = GetObject("winmgmts:{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")
Set colProcess = objWMIService.ExecQuery ("Select * from Win32_Process WHERE Name = '" & strProcessName & "'")
For Each objProcess in colProcess
If objProcess.Name = strProcessName Then
blnIsRunning = TRUE
End If
NextIf blnIsRunning = FALSE Then
IsRunning = False
Else
IsRunning= True
End IfEnd Function
'***************************************************************
'***************************************************************
'
' Purpose: Checks if the IP passed is on the network
'
' Returns: N/A
'
'***************************************************************
'***************************************************************Function IsConnected(ByVal strIpAddress)
Const OpenAsASCII = 0
Const FailIfNotExist = 0Dim TempFile, ConnectFile
TempFile = mobjFso.GetSpecialFolder(2).ShortPath & "\" & mobjFSO.GetTempName
Shell("%comspec% /c ping.exe -n 2 -w 500 " & strIpAddress & ">" & TempFile)
Set ConnectFile = mobjFso.OpenTextFile(TempFile, FOR_READING, FailIfNotExist, OpenAsASCII)Select Case InStr(ConnectFile.ReadAll, "TTL=")
Case 0
IsConnected = False
Case Else
IsConnected = True
End SelectConnectFile.Close
mobjFSO.DeleteFile(TempFile)End Function
**********************************************
**********************************************
'
' Purpose: Maps drive to query the KdsController.log file in DT02
'
' Returns: N/A
'
**********************************************
**********************************************Sub MapDrive(ByVal strPOSIp)
Dim objNetwork, colDrives, intDrive, strDriveLetter, blnDriveConnected
wscript.echo "beginning drive map: " & strPOSIp
Set objNetwork = CreateObject("WScript.Network")
Set colDrives = objNetwork.EnumNetworkDrivesblnDriveConnected = FALSE
Shell("net use R: \\"& strPOSIp &"\c$ /user:USERNAME PASSWORD /persistent:no")
strDriveLetter = "R:"
For intDrive = 0 To (colDrives.Count -1) Step 2
If colDrives.Item(intDrive) = strDriveLetter Then
blnDriveConnected = TRUE
End If
NextIf blnDriveConnected = FALSE Then
Shell("net use R: \\" & strPOSIp & "\c$ /user:USERNAME PASSWORD /persistent:no")
For intDrive = 0 To (colDrives.Count -1) Step 2
If colDrives.Item(intDrive) = strDriveLetter Then
blnDriveConnected = TRUE
Call LogProcess("Drive mapped using secondary password.")
End If
NextElse
Call LogProcess("Drive mapped using primary password.")
End IfSet objNetwork = Nothing
End Sub
'***************************************************************
'***************************************************************
'
' Purpose: Disconnects mapped drive
'
' Returns: N/A
'
'***************************************************************
'***************************************************************Sub DisconnectDrive()
Dim strDriveLetter
strDriveLetter = "R:"
Shell("net use " & strDriveLetter & " /delete")
End Sub
'***************************************************************
'***************************************************************
'
' Purpose: Checks existance of the MDSHosts.xml File
'
' Returns: N/A
'
'***************************************************************
'***************************************************************Sub CheckXML(ByRef strIp)
If mobjFso.FileExists(MDSXMLPath) Then
Call LogProcess("MDSHosts.xml is found on BOH.")
Call GetPOSIp(strIp)
Else
strIp = "FAILED"
Call LogProcess("Error: No MDSHosts.xml file found!")
End IfEnd Sub
'***************************************************************
'***************************************************************
'
' Purpose: Gets IP for POS from MDSHosts.xml file
'
' Returns: N/A
'
'***************************************************************
'***************************************************************Sub GetPOSIp(ByRef strIp)
Dim objChildNode, objXMLDOM, strXPath,colNodes,objNode,mstrIpAddress
Set objXMLDOM = CreateObject("Microsoft.XMLDOM")
objXMLDOM.async = FalseIf objXMLDOM.Load(MDSXMLPath) Then
strXPath="/NODES/NODE[IsBackupServer='T']/IPAddress"
Set colNodes = objXMLDOM.selectNodes(strXPath)For Each objNode In colNodes
strIp = CStr(objNode.Text)
Call LogProcess("Found POS IP as: " & strIp)
NextElse
strIp = "FAILED"
Call LogProcess("Error: Error getting IP for POS!")End If
Set objXMLDOM = Nothing
End Sub
'**********************************************
'**********************************************
'
' Purpose: Writes to Update Log
'
' Returns: N/A
'
'**********************************************
'**********************************************Sub LogProcess( _
ByVal strEventName)'Dim objScriptLogFile
'On error resume next
'Set objScriptLogFile = mobjFSO.OpenTextFile(mstrScriptLogName, FOR_APPENDING, True)
objLogFile.WriteLine(Now & " - " & strEventName)
'objLogFile.CloseEnd Sub
'**********************************************
'**********************************************
'
' Purpose: "Shells" (Executes) strCommand. Waits Until
' Completion
'
' Returns: N/A
'
'**********************************************
'**********************************************Sub Shell( _
ByVal strCommand)'On error resume next
Call mobjWshShell.Run(strCommand, VBHideWindow, true)
If Err Then
Call LogError("Error Executing " & strCommand)
End If
End Sub
Related articles by Zemanta
- Buggy McAfee update whacks Windows XP PCs (cnn.com)
- McAfee To Pay For PC Repairs After Patch Fiasco (it.slashdot.org)
- McAfee: An ounce of prevention can kill your PC (blogs.chron.com)
- McAfee apologises for update fiasco (telegraph.co.uk)
- McAfee Update Shuts Down XP Machines [Malware] (lifehacker.com)
- Buggy McAfee update slams Windows XP PCs (news.cnet.com)
- How to fix Windows XP PCs affected with McAfee Update (taragana.com)
- Broken McAfee DAT update cripples Windows workstations (arstechnica.com)
57 Comments
Trackbacks/Pingbacks
- Tweets that mention How Linux Saved A Fast Food Giant. | Holy Crap My Hair Is On Fire -- Topsy.com - [...] This post was mentioned on Twitter by Edwin. Edwin said: New Blog Post: How #Linux Saved A Fast Food ...
- How Linux Saved A Fast Food Giant. - [...] full post on Hacker News If you enjoyed this article, please consider sharing it! ...
- What makes hand rolled cigars much better than rolled by machine? | Hand Rolled Cigars - [...] How Linux Saved A Fast Food Giant. | Holy Crap My Hair Is On Fire [...]
- KafeKafe » How Linux Saved A Fast Food Giant - [...] How Linux Saved A Fast Food Giant. | Holy Crap My Hair Is On Fire. Share and ...
- What are some healthy and tasty dinner foods? | global warming kids - [...] How Linux Saved A Fast Food Giant. | Holy Crap My Hair Is On Fire [...]
- A Case Study in Problem Solving « The Wubbulous World of Jerry Waller - [...] A Case Study in Problem Solving How Linux Saved a Fast Food Giant [...]
- LXer: How Linux Saved A Fast Food Giant. - xBlurb - [...] Read More… [...]
- LXer: How Linux Saved A Fast Food Giant - xBlurb - [...] Read More… [...]
- Links 21/5/2010: KDE at Ökumenischer Kirchentag, Arch Linux 2010.05 | Techrights - [...] How Linux Saved A Fast Food Giant. Our last choice was to re-image the POS system using our existing ...
- How Linux Saved A Fast Food Giant « technichristian.net - [...] Read on… [...]
- Linux-Info - Pagina 91 - I Forum di Investireoggi - [...] la catastrofe, non avrebbero speso un sacco di soldi in licenze di Windows, di McAfee, ecc ecc ...
- How to Survive a Traffic Spike. Lessons Learned From Being [Temporarily] Popular. | Holy Crap My Hair Is On Fire - [...] last post on how Linux saved my contracting company was pretty damn popular. I seriously don’t think I will ...
- Tech Thoughts Daily Net News – May 23, 2010 « Bill Mullins' Weblog – Tech Thoughts - [...] How Linux Saved A Fast Food Giant – I am a Windows guy. But Linux saved me and the ...




![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=bc97035c-f40a-4edd-a64d-e8d9ec00f2c7)

duude, congrats!! excellent use of linux I would love to be able to do this kind of things, i'm just your regular triple boot (snowleo,win7,ubuntu) user! Hope you get a promotion and a damn good salary boost
Impressive out-of-the-box thinking.
Someone give this person a raise.
Great article. I love how you don't resort to the typical Linux fanboy thinking, that because Linux worked in many situations it's ideal for everything. You utilize each system to leverage data; Linux and Windows. That's what computing is all about.
I would recommend you blank out the user names and passwords in your scripts. I'm sure no one can get in and those may be false login credentials anyway, but nevertheless, I recommend changing them.
Great post!
You are correct that the username is indeed fake as well as the password. Something that simple for a production environment is just asking for trouble. For clarity I will edit it to be the same as the others.
Actually if they had used Linux in the first place they wouldn't have needed to pay McAfee or deal with virus protection at all and this problem wouldn't have happened. Now go to the suits and show them how they can save money by switching to linux. No McAfee for one thing, I'm sure that costs a lot.
They won't switch. They are too unfamiliar with it and there are maybe a handful if not less people in the building who know how to really use Linux. Plus after the serious investment made into the whole infrastructure, software, patches etc would be akin to admitting the past two years have been a colossal mistake. This day and age, no one wants to admit mistake or be held accountable.
Great story! I've had very similar experiences (or should I say horror stories) administering Windows based POS systems, although definitely not nearly at the same scale.
It's unfortunate that there aren't more Linux based point of sale systems out there, that is something I feel the restaurant industry could benefit greatly from, both in terms of cost and maintainability.
ViewTouch POS not only is Linux POS but the guy behind it has been at it for 30 years or more. In fact, he's the guy who first created what we think of today as graphical touchscreen POS. My father's friend has a restaurant with ViewTouch POS since 7 or 8 years ago and it's NEVER down.
> How does a compression program add a megabyte to the package? I have no idea.
It shouldn’t take much thought to realize that a compression program that could always compress *everything* could never work. Otherwise, you could keep zipping the file and making it smaller and smaller until there was nothing there. And what would an empty file unzip to if everything could be zipped up time after time until it became one?
That said, it *is* possible to limit the amount a compressor increases the file size of something by. You just append a “1″ to the compressed data if you were able to compress it. Otherwise, you leave it the hell alone and append a “0.” Decompressing it is easy: either remove the zero and do nothing or remove the 1 and decompress it. (Or you can just tell the user that compressing this was a bad idea, because there was no entropy to remove and avoid all that. Whichever.)
Really great article and no Windows/Linux fan boy ranting either! Good work
My recent post What is clustered hosting?
do donate/become a member to the linux foundation -http://www.linuxfoundation.org/about/join
I am a donating member of the EFF if that counts. Honestly, I am currently paying off my recent wedding and underemployed. If I could donate, I would.
oh I apologize – I didnt mean you. I meant appealing to the suits in the fast food company to help the organization that saved their butts.
Great story and great lesson!
Don’t forget to return the favor to the community (colaboration, sponsorship, donations…)
I hate WINDOZE!!!!!
Surely the failure here was to have ~700 AV installs updating directly from the manufacturer? Do you allow the *OS* to update itself overnight without requiring patches to be verified in a staging environment? I would hope not, given your obvious awareness of the cost of failure in a poorly-connected, geographically diverse enterprise.
Whilst the fix itself may be interesting and indicative of the flexibility that Linux brings, I genuinely hope someone inside the company got sacked for negligence here; from the information the author provides, they deserve it.
This is exactly what I was thinking. I'm amazed that they don't have their own Windows Update and AV servers cascading these fixes first to an experimental environment, then a staging environment, then a pilot set of stores close to the office, and then nationally. The risk of a bad update is much higher than a security risk for the few days this process would take.
I guess these guys are too young to remember the days when not all updates worked (I'm thinking Windows NT updates).
There are plenty of older IT people here, but my department doesn't handle the AV. That's actually outsourced to another sub-contractor.
I think the delay of distributing AV definition/engine updates after review is not acceptable given the time sensitivity. They should come direct from the mfgr, and at least check once a day. The demonstrated rarity that this happens from big vendors is worth it given the possible consequences of an internet connected computer not being updated quickly in case of a virulent threat.
Any other form of update I'd agree that needs to go through test, pilot, distribute. Now I also think their IT department was somewhat negligent for not having an automated way for POS devices to boot to a local copy of the working mirrored OS, or to have automated Windows System Restore on failed boot.
I have solved many Windows issues with the help of linux boot diks; but on a much smaller scale. It was nice to read about your approach. I have learnt keeping a linux disk handy is very helpful during emergencies.
The first mistake was running Windows XP POS systems directly exposed to the internet to begin with.
The POS systems are not directly on the internet. They aren't even indirectly. The updates were pushed down the private MPLS network.
Maybe people in general would have been better off if those Burger King stores weren't able to operate and dish out those shitty unhealthy burgers to people. I'm not so sure what you did here was actually a good thing in that sense.
I won't disagree with you, but the stores were still working regardless of the issue being fixed or not. After all, it was just one POS that wen't down. It's just that something like 80% of business is done through the drive thru.
congrats to yr Scripting Guy! ? Why Ubuntu? I've used the systemrescuehttp://www.sysresccd.org/ Linux User 409663
My recent post jatalon: A Windows guy gets vital help from Linux world. It is my everyday job…
I'm most familiar with Ubuntu. I've been using it since probably version 5.04.
Maybe this experience will also convince you why you should work towards removing windows from those POS terminals and going with Linux running there as well.
If I had my way, I probably would. But I am not the one in charge of such things.
The constant flashing of that "twitter search goodies" box is annoying as hell – I had to adblock it
You should take a look at Perl for Windows that way you would have one language to use on Windows, OS X, Linux/UNIX.
http://www.amazon.com/Perl-Scripting-Windows-Security-ebook/dp/B001DAHXDS/ref=sr_1_2?ie=UTF8&m=AG56TWVU5XWC2&s=digital-text&qid=1274194814&sr=8-2
http://www.amazon.com/Automating-System-Administration-Perl-Efficient/dp/059600639X/ref=sr_1_5?ie=UTF8&s=books&qid=1274194814&sr=8-5
You should replace the whole POS system with Linux if you can.
Sadly I am but a gun for hire and have no say in what happens. A common occurrence, I'm sure.
Congratulations on a brilliant save. In retrospect, it is easy to see a lot of "better" solutions, but any solution that works NOW is better than a theoretical solution that works later. You apparently have already done some additional research into smaller Linux solutions, some of which are less than 1MB.
It is indeasible to change the existiing system to Linux in the short term, but it may be feasible to boot ionto Linux and then have Linux boot to XP. if XP fails, Linux can take over and allow you to do an automated recovery.
What a nice and creative way to use (linux) technology.
Thanks for sharing.
I'm literally astounded at all of the inappropriate use of Microsoft Windows these days. The only reason you should use Windows for anything is for the user experience – i.e. desktop machines. Windows has no place in purpose-built machines with a well-defined, task-specific UI, such as a point of sale machine.
Can you imagine the amount of money you'd save by not having to purchase a Windows license, and a MacAfee license, for each of those CASH REGISTERs? Not to mention maintainability – as you saw with this incident, Linux is FAR easier to fix in remote, hard-to-reach or low-bandwidth locations.
I agree. One grocery store I frequent had their POS cash registers running WIn98 (don't ask) that crashed when ringing a customer just in front of me. Next time I came, they were all running non-windows system (couldn't tell which system, but it wasn't windows.) They must have lost some money doubly paying for windows and then for the replacement.
And your company STILL uses Windows? Wow. How much money did the BK king lose that day?
Nothing because McAfee picked up the tab on everything. They kinda had too.
Great article. I'm a bit of a Linux zealot myself (although not at all a fan of the GNU/FSF radicals – go figure), but I really appreciate the even-handed treatment you presented here. Further, your decorum in response to the comments section of the article speaks volumes, and sharing the scripts with us was a kindness. It's good to see there are still some adults on the internet.
We completely dumped Microsoft. A simpler solution. Getting the video to work on the client machines is usually a lot easier than the author surmises.
Well done.
I like four things about this story:
1. Thinking about the real problem, and remembering that a computer is a computer, not a windows machine. Its all 1s and 0s. You needed to understand what was wrong, and what had to be altered to put it right. Not just invoke graphical installers and automatic updates. Identify the file and overwrite it.
2. Letting your colleague write the script in an environment that he could make work, and taking the mental leap to hunt round for the right mini-distro
3. Not wasting time stripping down the linux system smaller – it could probably all have been done with half the code omitted. But this was not a research project, and you knew when to stop.
4. Publishing not just the story but the scripts. The Open source community saved your company, and you paid back. Thank you.
Nice article…linux is a savior a lot of times, ain’t it ?
After an experience like that I would have left Windows completely. No need to have AV with Linux after that which caused your problem in the first place.
@Slappy,
You can't do that, in a corporate enviroment where you've got machines with Win on them already. And! You don't need AV with Linux.
@A great story from support tech frontlines! Yes, Linux saves the day. And you already tasted its biggest shortcoming (not having video drivers in DSLinux).
“Yes, Linux saves the day. And you already tasted its biggest shortcoming (not having video drivers in DSLinux).”
That is DSL’s shortcoming, not Linux’s shortcoming.
this is usual bite situation that reveals bad product planning.. and i hope you really get more than "you'a super" from corps:)
on other side usually my overprotective behavior during development just irritate people:)
Great hack!
However I am sure it is forgotton for now until the next windows related fire drill
Congratulations! Why you don’t think about switching entirely to Linux and dumping Windows?
Consider donating to the Linux community some of the money you would have lost fixing this manually with another solution. That’s one of the best ways to say “Thank you”.
@Edwin
I'm surprised that the PoS systems aren't using an embedded OS like XP Embedded and don't have a failover recovery mode. A failover mode could be done with existing systems by using a customized BIOS that reboots to a FLASH storage device if the primary OS fails to boot within a specified period. The FLASH device could have a small Linux distro with SSH active or BartPE with RDC. Recovery procedures and analysis could then easily be performed remotely.
Not all of the POS systems are XP, in fact only one is. The others are Win CE embed that are basically thin clients.
Congratulations for the good work and for posting the solution.
Thanks a lot Edwin for this very informative article. Since this issue could occur again and help other users, I'd like to make sure I understood the whole process:
1. The POS ("Point of sale", ie. cash registers) run XP, and were hosed by the McAfee false-positive:http://news.cnet.com/8301-1009_3-20003074-83.html
Besides booting from their hard-disk, POS clients can also download a bootable image into RAM from a server through the PXE protocol, and boot from RAM. I assume POS clients are natively configured to first start looking for a PXE server before attempting to boot from their hard-disk
2. In addition to POS clients, each restaurant has a Windows server that can run TFTP32 to act as PXE server (http://tftpd32.jounin.net). TFTP32 includes a DHCP server to send IP configuration including the location of the TFTP server from which to download a bootable OS image, and a TFTP server to actually send the bootable image
3. TheRealEdwin + friends' solution:
- into a self-extracting 7ZIP file, pack TFTP32 and PLoP which is a 60MB Linux image that can be downloaded from clients through PXE and run from RAM
- use a VBScript to push this ZIP file to the Windows server located in each restaurant, and start TFTP32 on the server to let POS clients download PLoP
- The PLoP image contains a small bootup script that will 1) mount the Windows server as read-only and the local POS as read-write, 2) remove the bad SVCHOST.EXE and DAT file from the POS, 3) download a clean SVCHOST.EXE from the server, 4) download and install the VBScript that will be run when the client POS reboots under XP, and 5) reboot the POS client automatically
- The VBScript will run the McAfee fix, report success to a SQL server, and kill TFTP32
What I don't understand, is that each POS cannot kill TFTP32 on the server. I would expect the PLoP script to simply wait until all POS have received the fix from the server, allowing the local techie to kill TFTP32 on the server before rebooting all POS so they can proceed with running the VBScript that will fix them for good.
I'd appreciate if Edwin or someone else could confirm/correct the above.
Also, the Mini Linux page on Wikipedia (http://en.wikipedia.org/wiki/Mini_Linux) lists some much smaller images than PLoP such as TinyMe, Tiny Core Linux, or BasicLinux: Would some experienced user tell us which Linux image they would recommend to perform the same type of crash recovery (ie. just to boot up the client, download some files from a server, write them to the Windows client, and reboot)?
Thank you.
Mate, those Twitter comments are mighty annoying… they completely obscure the real comments (the Reddit comments are also not very helpful, being without context…).
OT: Besides that, you might be interested in Lucho’s cycling blog:
http://cyclinginquisition.blogspot.com/
@Edwin
I’m surprised that the PoS systems aren’t using an embedded OS like XP Embedded and don’t have a failover recovery mode. A failover mode could be done with existing systems by using a customized BIOS that reboots to a FLASH storage device if the primary OS fails to boot within a specified period. The FLASH device could have a small Linux distro with SSH active or BartPE with RDC. Recovery procedures and analysis could then easily be performed remotely.
You guys get A+ for creativity and thinking out of the box.
However, here are some questions you should ask yourself (and probably give yourself a D for).
1. How come you are using Windows for a simple standalone touch screen application ? what is WinXP good for in this case ?
2. Why are you running McAfee on these systems ? how can they be infected by viruses if they are only touch screen burger sale stations ? Who's the smart sales person who convinced you to run AV on your machines ?
3. If you decided to use Windows, and McAfee, how come you are not testing the McAfee updates on a lab environment before pushing it to all of your network ?
4. How come you have such a challenging and geographically sparse, and inaccessible network, and at the same time you don't have a disaster recovery plan in place (and tested) ?
As my grandpa use to say: "A wise person never gets into a trouble which a clever person knows how to get out of"
1. It comes from the vendor that way. I don't know, I wasn't here two years ago when they made that decision.
2. Again, another I have no idea I wasn't here at the time. I've heard that it's just protocol that any machine running Windows has to have AV. McAfee sales person probably sold it to us.
3. We should and I agree with you, but that's outsourced somewhere else. Not even my department.
4. I didn't design any of the current systems and I certainly would not have done it this way. A lot of the decisions made are based on standards set by credit card processors. Similar to how healthcare IT has to deal with HIPAA, we have to deal with the credit card equivalent.
If I had any power at all here, things would be very different. But nothing will ever change. It just doesn't happen with these large organizations. After all, I am just a temporary contract guy.
No need for AV on linux (yet..) =). Compliments on the article, a great read. It is possible for a trojan to disguise itself as svchost.exe by the way, had it myself a couple of years ago when I was still using windows xp.