WEBVTT

00:17.520 --> 00:21.480
Hey, everyone. Welcome back to it Horror Stories with Jack Smith.

00:21.980 --> 00:25.280
I am Jack Smith, and this time across the table, I am joined

00:25.780 --> 00:29.420
by Philip. Good evening, sir. Hi. Good evening. How are

00:29.920 --> 00:33.860
you doing? Pretty well. Things as usual. Well, looking at

00:34.360 --> 00:40.260
the story you brought, things as usual does not sound really motivating.

00:40.900 --> 00:43.940
Yeah, it's been a wild ride. Yeah.

00:44.100 --> 00:47.620
Because we did the DevOps royale with cheese

00:48.120 --> 00:52.020
episode 8 and we did then the shadow it reports episode 9 a

00:52.520 --> 00:55.880
few months ago, and you basically came up with something

00:56.380 --> 00:59.880
similar, but yet very different. Yeah, that I did want

01:00.380 --> 01:04.560
to throw in here because the original plan was to. Yeah, let's do more

01:05.060 --> 01:08.680
about shadowit because it is so broad and even I didn't see this one

01:09.180 --> 01:12.440
coming. So I am going to give you

01:12.940 --> 01:16.560
the microphone. I am going to jump in left and right wherever I can.

01:17.060 --> 01:20.680
And I think I can be interesting, which is every five seconds. Everybody be

01:21.180 --> 01:23.560
warned. The microphone, sir Philip, is yours.

01:23.960 --> 01:27.080
Okay, so to give you some backstory,

01:27.580 --> 01:31.320
I was working as a contractor for a subcontractor for Telco.

01:31.640 --> 01:34.840
It was, you could say, a pretty big name

01:35.340 --> 01:39.240
around these parts. Okay. But company I worked for was basically

01:39.740 --> 01:43.360
a subcontractor for them. And the reality was that that

01:43.860 --> 01:47.280
was more of a. Yay. We're a telecommunication. Telecommunications company.

01:47.780 --> 01:51.180
Yeehaw. So if you say a subcontractor,

01:51.680 --> 01:55.380
like an entire service provider, as in, we'll take care of your help desk,

01:55.880 --> 01:59.060
we'll take care of your field services. It was mostly

01:59.560 --> 02:03.620
field services, so we didn't have anything to do with end user of said Telco

02:04.120 --> 02:07.940
company, but it was more of the installation of the

02:08.440 --> 02:11.980
infrastructure and things like that. So I am now confused and I have questions

02:12.480 --> 02:16.100
about you guys. Maybe not you yourself. You were going out on the streets in

02:16.600 --> 02:19.980
setting up, say, the infrastructure so the modems, and you don't deal with the end

02:20.480 --> 02:24.660
user. How did you manage that? Well, it was the backend infrastructure,

02:24.740 --> 02:28.500
so. Oh, okay. Yeah, yeah. So the end users, they,

02:28.900 --> 02:31.940
to be fair, they have subcontractors for that as well,

02:32.020 --> 02:35.420
but I see, yeah. So you were doing services for

02:35.920 --> 02:38.660
a different contractor that dealt with the end users. Yes,

02:39.160 --> 02:43.220
exactly. Isn't the wonder. Isn't the wonderful world of outsourcing

02:43.660 --> 02:47.260
overly needlessly complicated? Oh, it's. Yeah, it's overly

02:47.760 --> 02:51.500
needlessly complicated by design and I think they really want to keep it that

02:52.000 --> 02:55.420
way. Wonderful, wonderful framing in

02:55.920 --> 03:00.380
complexity. Yeah, please go on. This particular subcontractor

03:00.880 --> 03:04.700
was what I like to call cowboy it. So I think that's a phrase

03:05.200 --> 03:08.660
that your listeners might be familiar with. I'm familiar

03:09.160 --> 03:10.940
with it. Which is. Which.

03:12.230 --> 03:15.350
Yep. So you're. You're basically shooting it from the hip.

03:15.430 --> 03:18.470
Yep, basically. That kind of. What could go wrong? Yeah,

03:18.970 --> 03:23.070
exactly. There was this particular. I think it was a Tuesday or a

03:23.570 --> 03:26.710
Wednesday. I don't remember that in detail, but. But I came

03:27.210 --> 03:31.030
in in the morning as usual, you know, motivated and everything.

03:31.110 --> 03:34.270
Motivated and everything. I went to the coffee

03:34.770 --> 03:38.340
machine, got my, my coffee to start

03:38.840 --> 03:42.420
the day properly. And yeah, then I noticed that a lot of the help

03:42.920 --> 03:46.420
desk people were pretty frantic. I think

03:46.920 --> 03:50.660
about half an hour after I arrived, there were some murmurs of,

03:50.740 --> 03:54.059
yeah, there's a problem with the platform. So the

03:54.559 --> 03:58.380
platform, quote, unquote, the platform about half an

03:58.880 --> 04:02.140
hour after I started. Because that was also the time that the

04:02.640 --> 04:06.020
field engineers would be. Oh, yeah, so you would come in early and then

04:06.100 --> 04:09.770
once they get to the first on site customer, you're there just in case.

04:10.080 --> 04:14.080
Yes, exactly. Okay. And then, well, poof, it was

04:14.580 --> 04:18.160
getting a bit more busy than usual. They got a lot of calls,

04:18.660 --> 04:21.960
a bit more than they usually get. They do get a lot of calls usually,

04:22.460 --> 04:25.800
but this was this time it was. Yeah, it was a lot different. I heard

04:26.300 --> 04:29.520
some of the help desk engineers say, oh, yeah, but the platform is down.

04:30.020 --> 04:33.400
And then another one responded, but, no, it's not down. It's always this slow.

04:33.900 --> 04:36.800
They just need to be patient and blah, blah, blah. The fact that it was

04:37.300 --> 04:40.960
slow is important, so keep that in mind. But they were saying, yeah,

04:41.460 --> 04:45.100
it can be down. It was working perfectly yesterday and nothing changed.

04:45.600 --> 04:48.700
And nothing changed, of course, nothing changed. Famous last words,

04:49.200 --> 04:52.580
obviously. Well, at least you're not updating in

04:53.080 --> 04:56.140
production on a Friday. So at least it was on a Monday then. Yeah,

04:56.640 --> 05:00.220
yeah, exactly. I see no problem. No further problem. That's kind of

05:00.720 --> 05:04.540
the advantage of being in a cowboy IT environment. They kind of

05:05.040 --> 05:08.690
learned their lesson in a way that they shouldn't update things on Friday,

05:08.770 --> 05:12.530
especially not when there's happy hour on Friday. So priorities,

05:12.610 --> 05:14.850
priorities. Exactly. Lovely.

05:15.890 --> 05:19.130
It started to get more busy. A lot more

05:19.630 --> 05:22.850
calls were coming in and there were a lot of. A lot of people saying,

05:23.350 --> 05:26.650
but yeah, it is down, it is down and this is a disaster, and blah,

05:27.150 --> 05:30.770
blah, blah. As one does now, to give you some context, the platform

05:31.270 --> 05:35.570
they were talking about was a ticketing application for the field engineers. So basically

05:36.070 --> 05:39.810
There were about 200 field engineers and they would log into this specific

05:40.310 --> 05:43.790
application. It was basically to follow up on their work.

05:43.870 --> 05:47.230
So if they had to do an installation, they would do the logging in there.

05:47.730 --> 05:50.910
They would get their tickets there to Start the work and you know,

05:52.270 --> 05:55.470
basically their entire day planning was in there. Yes, basically.

05:55.710 --> 05:59.470
Yeah, yeah, yeah. I can imagine. If that's not

05:59.970 --> 06:03.350
reachable, it's not going to be a good day. Yeah, yeah, you could say

06:03.850 --> 06:06.930
that. It was fairly quote unquote, business critical. They were used

06:07.430 --> 06:11.090
to it being slow in the sense that. Slow. I don't know if your

06:11.590 --> 06:15.290
listeners are of the age that they were still using 56k modems,

06:15.790 --> 06:19.570
but you can kind of compare it with that. Our beloved statistics on

06:19.650 --> 06:22.810
YouTube, Spotify and Apple Music confirm

06:23.310 --> 06:27.010
the age bracket. So yes, we know. So you can imagine about two or three

06:27.510 --> 06:30.730
minutes per page refresh that wasn't out of the ordinary. That's not

06:31.230 --> 06:34.510
even 56K modem. That is. That is male pigeon speed.

06:35.010 --> 06:38.230
Yes, exactly. Actually, I think there was this study that

06:38.730 --> 06:42.310
pigeons might have been faster actually. But why does my brain now goes

06:42.810 --> 06:45.750
to avian carrier networks over tribute? Yes, exactly.

06:46.250 --> 06:49.910
You know exactly what I'm talking about. Oh yes, we have discussed this multiple times.

06:51.670 --> 06:54.870
It was even that bad some days. So that

06:55.370 --> 06:58.910
you could almost prepare a full English background. And then it's logical that they tell

06:59.410 --> 07:02.470
you, oh, just wait a few minutes, it'll get there. Yeah, exactly.

07:02.550 --> 07:05.590
I guess we're going to come back to this, but I don't

07:06.090 --> 07:10.030
want to know the hardware platform or the setup or the whatever, because too many

07:10.530 --> 07:14.190
questions now. And we are not in a third call

07:14.690 --> 07:18.670
at the moment, so I'm not going to care yet. You could say that

07:19.170 --> 07:22.510
it would be surprising for a telco company to have a setup like this.

07:23.010 --> 07:26.550
But more on that later. We have mentioned mainframes. Fair enough.

07:27.050 --> 07:30.870
In a few episodes. So I'm ready for

07:31.370 --> 07:35.150
most of things, like not sure that I want to talk about a

07:35.650 --> 07:39.190
VAX or a Wang, but this time instead

07:39.690 --> 07:43.670
of just it being slow, there were some people

07:44.170 --> 07:47.270
that were getting NGINX pages. So they were getting.

07:47.510 --> 07:51.070
For those familiar with nginx, you can use it

07:51.570 --> 07:55.190
as proxy. So that means that you can get error pages from NGINX

07:55.690 --> 07:59.030
itself. So you can get 404s and 504s from NGINX.

07:59.110 --> 08:02.820
It's not just a normal browser saying this page cannot be

08:03.320 --> 08:06.420
found. It's something in the middle says whatever I need to

08:06.920 --> 08:10.540
talk to is not there. Yeah. This endpoint that doesn't exist, please investigate.

08:10.780 --> 08:14.060
So it's going to be something else. Yeah, exactly. Whoopsie daisy.

08:14.560 --> 08:17.780
Yeah. So you can imagine that the project manager on duty was kind of

08:18.280 --> 08:22.220
sweating bullets and having an aneurysm. Probably on the verge of starting

08:22.720 --> 08:25.900
with A drinking problem. But you can imagine the chaos that

08:26.400 --> 08:29.480
was ensuing at that point. So. Oh, wonderful day.

08:29.560 --> 08:32.280
Yeah, exactly. So just a regular Tuesday.

08:32.520 --> 08:35.720
Or not. So we tried pinpointing the

08:36.520 --> 08:39.680
issue. The help desk people were

08:40.180 --> 08:42.360
frantically running around like headless chickens. At that point,

08:43.160 --> 08:46.800
a junior I remember very well was completely

08:47.300 --> 08:50.840
panicking saying that, yeah, I can't even ping the machine.

08:51.340 --> 08:55.040
It's not reachable. I don't know what's going on. Another one was frantically

08:55.540 --> 08:59.760
tapping his keyboard, trying to find out from the logs from the NGINX

09:00.260 --> 09:03.640
machine what was going on and basic troubleshooting steps,

09:04.140 --> 09:07.720
but things weren't going that well. Kind of an understanding. Yeah, because you

09:08.220 --> 09:11.480
have described headless chickens so that basically everybody is doing something

09:11.980 --> 09:15.600
and there's zero coordination going on. Yes, exactly. So of

09:16.100 --> 09:19.640
course I was there as a contractor. So you

09:20.140 --> 09:21.600
tend to shut up. I tend to shut up.

09:24.480 --> 09:27.720
I just want to see the world burn. If something comes to me, I'll give

09:28.220 --> 09:31.980
my opinion. Yeah, exactly. But even, even at some point it

09:32.480 --> 09:36.700
was like, okay, this is just getting too much now. Mind you, I wasn't

09:37.260 --> 09:40.900
fully aware of their infrastructure because I said, not my

09:41.400 --> 09:44.300
job. You're not. You're there to do one thing, not manage the infrastructure. Yes,

09:44.800 --> 09:48.140
exactly. Yeah. I was more of the development side at that point.

09:48.640 --> 09:52.220
So I was just working on my stuff locally. So whatever,

09:52.540 --> 09:55.980
they'll solve it, I guess at some point. Yeah, at the server it

09:56.480 --> 10:00.340
is not your problem as such, you're not impacted. But can I take

10:00.420 --> 10:02.900
an estimated guess here to what you've mentioned already?

10:03.620 --> 10:06.980
NGINX says I can't reach the machine. Somewhere someone

10:07.300 --> 10:10.540
found out the internal IP and said, well, it's not

10:11.040 --> 10:15.140
responding. Yes. If you're still running around like a bunch of headless chickens after

10:15.640 --> 10:19.260
that point, instead of trying to find out where Utopic idea.

10:19.760 --> 10:22.900
Your documentation says that IP address is located.

10:24.410 --> 10:27.810
Yeah. So you can stop

10:28.310 --> 10:32.170
running around the chicken coop. Yeah, there's your root. Cause the IP is down.

10:32.670 --> 10:36.290
And then why it is down? Well, let's go put on your data helmet

10:36.790 --> 10:40.970
and let's go digging. Yes, exactly. Now, to be fair, those hell

10:41.470 --> 10:45.170
deskers, they were not really on the operations side

10:45.670 --> 10:49.050
of things. So it was just the go in between. Is the hell of

10:49.550 --> 10:52.810
desk guys. They don't know how the backend works. They know the tool, they know

10:53.310 --> 10:56.370
their pages and that's all what their pay bracket is there for.

10:56.610 --> 10:59.090
So absolutely nothing against the hell of this guy.

11:04.850 --> 11:08.730
We choose not to have any subscription, model nor sponsoring in order to

11:09.230 --> 11:12.570
keep our stories accessible for each and Everyone to support us,

11:13.070 --> 11:17.230
please check our merchandise@shop.it horror stories.us or

11:17.730 --> 11:21.710
buy us a coffee@co dashee.com it horror stories.

11:30.830 --> 11:34.870
Yeah, exactly. So I can imagine them not

11:35.370 --> 11:38.470
being, oh, yeah, we need to check the server itself and blah,

11:38.970 --> 11:42.430
blah, blah. Usually you would go to the application owner now and say,

11:42.930 --> 11:46.210
hey, your thing is down. And then in your escalation matrix, you can find

11:46.710 --> 11:50.210
this person's contact details and hope he's not on vacation for three weeks. Yeah,

11:50.710 --> 11:54.170
exactly. Or create a priority one ticket and throw it up the chain, basically.

11:54.670 --> 11:57.930
So if you can't find anyone, you escalate up and maybe somebody knows someone.

11:58.430 --> 12:01.970
Indeed. Of course. I was in the emergency meeting and

12:02.050 --> 12:05.410
I kind of saw things unfold. And then I

12:05.910 --> 12:08.530
being a little bit proactive, I was like,

12:08.920 --> 12:12.040
yeah, but where is this machine located?

12:12.120 --> 12:15.560
Is it cloud hosted? Is it in a data center somewhere?

12:16.120 --> 12:19.560
And then one of the other DevOps people said,

12:20.060 --> 12:23.600
oh yeah, but this particular server is in our data

12:24.100 --> 12:27.880
center. And I was like, okay, sure, but that's kind of my question. Where is

12:28.380 --> 12:32.040
this data center located? Because if you can't ping the machine, then something

12:32.540 --> 12:36.460
is wrong there. Apparently the data center was local in the

12:36.960 --> 12:40.260
sense that it was in the same building. Oh, that's a good one.

12:40.340 --> 12:44.500
It was basically the next room over from where we were sitting.

12:46.820 --> 12:50.380
This is going too easy. As you

12:50.880 --> 12:53.660
can imagine. I was already half intrigued,

12:54.160 --> 12:57.660
half horrified. So I suggested to let's have

12:58.160 --> 13:01.820
a look at that machine. Are we able to access that room? Is there a

13:02.320 --> 13:05.620
possibility that we can just go there and have a look

13:06.120 --> 13:09.980
with our own eyes? What's going on? Are the blinking lights still blinking? Would be

13:10.480 --> 13:13.820
your first check. Is there something. Yeah, yeah, yeah. Indeed. The project

13:14.320 --> 13:17.500
manager was, yeah, that's not a bad idea. And it's like, yeah,

13:18.000 --> 13:21.660
okay, I sometimes have those. So we went to investigate

13:22.160 --> 13:25.380
on how to access that room because it's like

13:25.880 --> 13:29.900
a broom closet. So I mean, we're looking for the

13:30.400 --> 13:34.140
key for this specific room. And that was a point where

13:34.460 --> 13:37.860
we went to the receptionist, our dearest

13:38.360 --> 13:40.760
Linda, the gatekeeper high priest. Pieces of the key cabinet.

13:42.520 --> 13:46.000
I know the type. Yeah. So after

13:46.500 --> 13:49.720
a difficult and perilous search through the drawer of mystery,

13:50.220 --> 13:53.720
she emerged victorious with small and unassuming

13:54.220 --> 13:58.000
metallic key. We managed to get into the quote unquote

13:58.500 --> 14:02.040
data center. So no keypad, no badge to enter security

14:02.200 --> 14:05.520
through obscurity. You'll never guess that this holds a data room.

14:05.750 --> 14:09.670
Yeah, exactly. And the only thing stopping

14:10.170 --> 14:13.590
you from entering was basically just Linda Wielder of the one Key to rule them

14:14.090 --> 14:17.750
all. So, so yeah,

14:18.250 --> 14:21.590
we, we entered this so called data center.

14:21.910 --> 14:25.910
Now at this point, obviously I wasn't expecting any biometric scanners

14:26.410 --> 14:28.870
or high level security detail or. No, no,

14:29.510 --> 14:35.960
you're. No, we've, we have, we have passed this level exactly.

14:37.080 --> 14:40.200
But when we entered this specific

14:40.700 --> 14:43.720
room, it was honestly even worse than I expected.

14:44.280 --> 14:48.960
So my expectations were already low. But let's

14:49.460 --> 14:53.000
just say that we entered that room and instead of a data center, we found

14:53.500 --> 14:55.960
ourselves in a glorified storage closet with commitment issues.

14:57.480 --> 15:02.160
Okay, so how many people were on the floor usually?

15:02.660 --> 15:06.520
So are you talking 20, 50, 100 people? This was

15:07.020 --> 15:10.720
not the main office of this specific telco. Obviously at any given time

15:11.220 --> 15:14.240
I would say about maybe 30 people. Okay, yeah,

15:14.740 --> 15:18.520
yeah. So 30 people would put you into the thing of. Unless there's a

15:19.020 --> 15:22.520
security audit coming up, we're not gonna do anything with the data room

15:23.020 --> 15:26.160
and just forget it exists. Yes, basically that. Yeah.

15:26.800 --> 15:30.360
And then if you get an audit, you put up some

15:30.860 --> 15:34.700
plywood walls, you know, put in the door and say, well, it's separated

15:35.200 --> 15:39.060
now. Yeah. So this is officially a storage closet. Now here's a broom,

15:39.560 --> 15:43.700
here's a bucket, you know, and here's. Well, so. But were the machines

15:44.200 --> 15:47.740
in a rack? Yes, Funnily enough, they did have two small,

15:47.980 --> 15:50.220
I want to say 42U racks there.

15:50.940 --> 15:54.140
42 is full size. They were filled with a lot

15:54.640 --> 15:57.340
of hardware. I'll get back to that later. But. Well,

15:57.910 --> 16:01.670
let's just say that there were still some machines in business there that were

16:02.070 --> 16:05.590
old enough to have a driver's license. So what you often see is

16:05.670 --> 16:08.990
a machine or a service will get decommissioned, but then they forget the

16:09.490 --> 16:12.663
machine and it's left powered on, screaming for NT4 Service Pack

16:12.737 --> 16:16.190
6 updates. Yes. In the distance with two of the

16:16.690 --> 16:20.590
three RAID disks in error. So lots of screaming from piezo

16:21.090 --> 16:23.670
speakers going on. Yeah, you know, it's bad when the servers look like this.

16:23.820 --> 16:26.780
They've seen things. I can imagine the,

16:26.940 --> 16:30.060
the range of compact servers in front of me.

16:30.540 --> 16:33.740
Yeah, you're, you're that have seen things. Yes. Yeah,

16:34.240 --> 16:37.900
you're not far off. So, I mean a couple of them were

16:38.400 --> 16:42.380
still blinking away. But like in that I'm trying my best,

16:42.460 --> 16:45.620
please let me retire kind of way, somebody shoot me.

16:46.120 --> 16:49.710
Somebody shoot me. Exactly. So we went ahead and fans

16:50.210 --> 16:53.830
groaning left and right aside, we went to investigate the machine that was

16:54.330 --> 16:57.390
allegedly running this specific application. Allegedly.

16:57.710 --> 17:01.590
But of course, yeah. My hopes of it still being up and running

17:02.090 --> 17:05.629
were lower than the remaining warranty of any machine in that room. So you can

17:06.129 --> 17:09.110
kind of imagine the situation at this point. Yeah, when you enter the room,

17:09.610 --> 17:12.670
you see all the old hardware and you see more red than green lights blinking,

17:13.310 --> 17:17.230
running. And finally the RAID container died or

17:17.580 --> 17:21.540
the thing just gave up and is now blue screened

17:22.040 --> 17:26.300
or Colonel panicked. Whatever platform you're using, did you pray

17:26.800 --> 17:27.740
to Saint Rebootius?

17:30.620 --> 17:34.020
To be honest, I think if you would reboot any of those machines,

17:34.520 --> 17:38.340
they wouldn't start up anymore. So even praying to Saint Rebootius at

17:38.840 --> 17:41.420
that point would be proven useless, I suppose.

17:41.980 --> 17:45.700
Well, okay, so that

17:46.200 --> 17:49.980
wasn't done. What was next? Well, next we were trying

17:50.480 --> 17:54.100
to locate this specific machines and honestly

17:54.600 --> 17:57.140
I was surprised that those machines were labeled at all,

17:57.780 --> 18:01.500
seeing the state of things. But I think we've been looking for

18:02.000 --> 18:05.660
like an hour or so. Meanwhile, helpdesk still frantically

18:06.160 --> 18:10.000
running around like headless chickens. But nothing in the rack really matched at

18:10.500 --> 18:14.400
what we were looking for because you had an IP address, therefore you hopefully had

18:14.900 --> 18:19.080
a DNS name. Then we are looking for SRV005.

18:19.320 --> 18:22.440
Exactly. So we were basically checking the labels at this

18:22.940 --> 18:26.720
point. But the one that we had on file, spoiler alert,

18:27.220 --> 18:31.000
the file that I'm referring to was basically an Excel sheet that was on

18:31.500 --> 18:35.000
some local share hosted somewhere. But yeah,

18:35.080 --> 18:38.530
we couldn't really find it. Oh, we've had discussions

18:39.030 --> 18:42.650
here in previous episodes where we would have given a lot of money for

18:43.150 --> 18:47.050
and for basic Excel sheets somewhere. So that's okay.

18:47.550 --> 18:50.890
Okay, so yeah, that was one thing that they kind of did. Okayish.

18:51.130 --> 18:54.330
Nice. We were staring at a rack full

18:54.830 --> 18:58.250
of cobwebs, ancient hardware, a lot of cable, spaghetti and sadness.

18:58.330 --> 19:01.490
And I just happened to stare closer into a

19:01.990 --> 19:05.850
suspicious opening about the size of two units tall. So the server

19:06.350 --> 19:09.970
that it was supposed to running on was missing

19:10.470 --> 19:14.130
in action. As in it was gone. Instead, what I saw was

19:14.530 --> 19:18.450
just the saddest pair of cables that I've ever seen in a

19:18.950 --> 19:22.490
quote unquote professional environment. A VGA, a PS2 and

19:22.990 --> 19:25.970
a power cable. Oh no, not even that. Oh dear.

19:26.050 --> 19:29.410
What I saw there was a patch cable. So so

19:29.910 --> 19:33.530
far, nothing unexpected. And the other cable that I saw, there was

19:34.030 --> 19:35.370
a micro USB cable. Just.

19:36.730 --> 19:39.610
That's it. Okay, awkward silence moment.

19:40.570 --> 19:43.930
Micro usb. USB C. USB A.

19:44.090 --> 19:47.730
Because if it was USB C, could have been a laptop. No, it wasn't USB

19:48.230 --> 19:51.770
C. And this was pre USB C Laptop charger days. Oh dear.

19:52.270 --> 19:56.010
Yes. Please don't tell me that they did that. Yeah, I think

19:56.510 --> 20:00.650
you can kind of see where this is going. So eventually the cats came crawling

20:01.150 --> 20:04.720
out of the bag. And keep in mind, this was around the time that things

20:05.220 --> 20:08.600
like home automation and stuff like that and tinkering

20:09.100 --> 20:12.920
with hardware was getting popular. So apparently one of the engineers,

20:13.080 --> 20:16.800
bless his enthusiasm, had taken the machine that was

20:17.300 --> 20:19.720
connected their home to experiment and play around with.

20:20.359 --> 20:23.600
So it was basically running on a Raspberry

20:24.100 --> 20:27.400
PI. This poses many questions, but also gives a lot of answers.

20:27.560 --> 20:30.640
Yes. As to why it was going so slow.

20:31.140 --> 20:34.990
Yes, exactly. It was just a basic Raspberry PI. I think it

20:35.490 --> 20:39.350
was one of the first models even that was running the ticketing system,

20:39.910 --> 20:43.430
which was needed for about 200 field engineers,

20:43.590 --> 20:46.790
where they logged in every single day to get to their

20:47.290 --> 20:50.950
tickets. So if I understand correctly, you had

20:51.350 --> 20:55.270
200 engineers on a first second

20:55.770 --> 21:00.200
generation Raspberry PI doing your ticketing

21:00.700 --> 21:04.520
for all your field services. Yeah, pretty much. So I think the

21:05.020 --> 21:08.600
thing that actually saved the fact that it was under

21:09.100 --> 21:13.120
this kind of load was the nginx proxy doing the caching.

21:13.620 --> 21:17.160
Yeah, basically. So at least that's my assumption for now,

21:17.660 --> 21:21.360
because I don't know the exact details of that setup,

21:21.860 --> 21:25.410
but sounds very plausible nonetheless. Nothing says production grades,

21:25.910 --> 21:30.210
infrastructure, hosting an app for 200 field engineers like a €35

21:30.850 --> 21:33.490
hobby board normally used to teach kids Python.

21:35.970 --> 21:39.849
This podcast features Jack Smith and guests. We say thank you to

21:40.349 --> 21:43.730
our demoscene friends who helped making this podcast possible. Commander Homer

21:44.230 --> 21:47.570
for editing, Danko for music and audio advice, NetPoet for

21:48.070 --> 21:48.850
additional voiceovers.

22:05.989 --> 22:09.550
To be fair, that engineer also assumed that no one would be insane

22:10.050 --> 22:13.590
enough to go from test deployment on a PI to production because it works.

22:14.230 --> 22:17.620
I can't be too upset of at that specific engineer,

22:18.120 --> 22:21.340
to be honest. Well, you don't know who set it live there. It could be

22:21.840 --> 22:24.940
the engineer, could be someone else that says, oh, this was made in test,

22:25.440 --> 22:29.020
it works. Let's just push it out and we'll move it later. Yeah, exactly.

22:29.520 --> 22:33.140
So finding the guilty person for this will be a different endeavor.

22:33.640 --> 22:37.020
Yeah, I think there's better use of your time than trying to

22:37.520 --> 22:40.620
figure out what exactly. As we have mentioned a few times in

22:41.120 --> 22:44.840
episode 10 when Anonymous attacks on the DDoS, your focus

22:45.340 --> 22:48.960
is to get back online and not to finger point in a period of

22:49.460 --> 22:52.760
crisis. So yeah, indeed. Starting to point fingers

22:52.840 --> 22:56.280
during your emergency meetings and trying to already

22:56.440 --> 23:00.360
throw people under the bus. Wrong place, wrong time. Let's get your

23:00.440 --> 23:03.160
200 field guys back online now.

23:03.660 --> 23:06.040
Going on the ledge here. Because it's a Raspberry PI.

23:06.680 --> 23:09.840
Do I have to wish for a backup?

23:10.340 --> 23:13.680
Because the storage went with the device. Yeah,

23:13.760 --> 23:17.160
so funny thing about that is that we were

23:17.660 --> 23:21.200
kind of just in time, I think, to get the SD card.

23:21.700 --> 23:25.600
Yes, the SD cards back. So it's really fun to run a SQLite

23:26.100 --> 23:29.440
database on an SD card. So you found the actual SD card?

23:29.940 --> 23:33.360
Yes, where the engineer was like,

23:33.860 --> 23:37.040
oh yeah. Because that's how we found out that the engineer took it home.

23:37.540 --> 23:40.310
Is like oh, so the guy that took it home was still there.

23:40.390 --> 23:42.070
Yes, sure,

23:44.150 --> 23:47.270
yeah. Like I said, cowboy it. I mean it's.

23:47.590 --> 23:50.950
Yeah, this goes for a Yeepik I.

23:51.190 --> 23:54.990
Yeah, exactly. So yee haw and let's go. But we

23:55.490 --> 23:58.630
got the SD card back, which thankfully wasn't formatted.

23:59.110 --> 24:02.950
That specific engineer was like, I'm just going to use a different SD

24:03.450 --> 24:06.990
card, you know, just in case. So again, the guy wasn't incompetent

24:07.490 --> 24:10.690
or anything like that. But it's. Oh no, no, no, it' oh yeah,

24:11.190 --> 24:14.890
this is my old test Raspberry PI. I'll just take it out and put it

24:15.390 --> 24:18.330
back in the drawer and I have a bigger SD card for it now,

24:18.830 --> 24:22.370
so I'll just take out the old one. Yeah, pretty much that. So. Oh dear,

24:22.850 --> 24:26.490
we got that SD card back. Of course, I do not want to know the

24:26.990 --> 24:30.290
write cycles on that thing, especially since it's a SQLite database.

24:30.450 --> 24:34.130
Well, yes. And then we ended

24:34.630 --> 24:38.020
up migrating the app to a dedicated VM with

24:38.520 --> 24:41.980
backups this time and postgres database.

24:42.480 --> 24:45.820
Like a real life growing up application on an actual ESX

24:46.320 --> 24:50.020
server that stayed in the building. The funny thing is that we also managed to

24:50.520 --> 24:53.739
convince finance to fork over the cache for a

24:54.239 --> 24:58.500
semi decent box to migrate all the other legacy applications

24:59.000 --> 25:02.980
to the poor hardware that was running there for over 25

25:03.480 --> 25:07.500
years could finally be put to rest. The atypical

25:08.000 --> 25:11.400
scenario of no, no, it's all ok, okay, it's fine, we don't need a maintenance

25:11.900 --> 25:14.520
contract. And then when things go poof. Yeah,

25:15.020 --> 25:17.720
suddenly everything is possible because they do realize that.

25:18.440 --> 25:21.400
Oh yeah, this is useful. Yes, exactly.

25:21.480 --> 25:24.600
So it's an FAFO moment. Kind of so. And that's

25:25.100 --> 25:29.200
basically how it went as far as I know. I think this

25:29.700 --> 25:33.320
specific application is still running unless they move to something completely different.

25:33.820 --> 25:37.000
But when I left there, it was running happily on that new machine. So.

25:37.700 --> 25:40.780
Well, as long as the guy who maintains it is still there or

25:41.280 --> 25:45.540
is reachable, I can imagine that indeed it is going to be still

25:46.040 --> 25:49.780
in use. And well, as you also move to more

25:49.860 --> 25:53.020
powerful hardware. Did the field engineers come back with, wow,

25:53.520 --> 25:57.620
this is now going so blazingly fast. Yeah, I think they had some

25:57.860 --> 26:01.380
very good feedback when it came to the speed

26:01.880 --> 26:05.050
of the application. So I'm pretty sure that that the field engineers were

26:05.550 --> 26:09.890
happy at that point. Yeah, I'm now wondering as to people listening,

26:10.610 --> 26:13.970
suddenly realizing oh crap, we have a Raspberry PI in

26:14.470 --> 26:18.050
the data center. And now it's okay

26:18.550 --> 26:21.970
because I've used them as well for a network probe or logging

26:22.470 --> 26:25.770
that you just plug it into the network and you plug it into a switch

26:26.270 --> 26:29.690
where you switch port where you give all the VLANs that you can do tracking

26:30.190 --> 26:34.200
for performance and everything and just some logging that's perfectly fine and even yep

26:34.700 --> 26:38.360
if you want to test something out quickly and you have something specifically with the

26:38.860 --> 26:42.920
later PI 3s and 4s, they're okay ish to

26:43.420 --> 26:47.360
do that. But yeah, you should never run that into production. Although I see many

26:47.600 --> 26:51.360
social media posts and YouTubes on I made myself

26:51.860 --> 26:55.400
a Raspberry PI file server cluster with some additional boards

26:55.900 --> 26:58.920
where you then put SSDs on it or similar. I'm not sure

26:59.420 --> 27:02.690
that is a good idea. Although yes, it has more CPU power than your

27:03.190 --> 27:07.130
average Synology. Yeah, and there have been some issues with Synology

27:07.210 --> 27:11.010
lately as well to give an example. But yeah, that's a completely

27:11.510 --> 27:14.250
different story. Of course I am an avid Synology user myself.

27:14.730 --> 27:18.210
Yes, it is not perfect, but it is as idiot

27:18.710 --> 27:21.410
proof that I can work with it. Fair enough. So to jump back on the

27:21.910 --> 27:25.530
Raspberry PI. Okay, it happens. It happened. It was resolved.

27:25.970 --> 27:30.130
Yes, but again only when things went poof. Yeah, always take

27:30.630 --> 27:34.130
precautions before it's too late. And this is as I mentioned

27:34.630 --> 27:38.250
it earlier, having been in well, not with subcontractors, but when you're

27:38.750 --> 27:42.450
doing mager and acquisition, you also analyze

27:42.950 --> 27:46.690
the local data center. Is it a closet? Is it a cabinet? Is it secure?

27:47.250 --> 27:51.090
Is there air conditioning? Is there a backup? And very

27:51.170 --> 27:54.810
often you come back with no. It's only when you're starting to

27:55.310 --> 27:58.390
get into a few hundred people organization that there is time and

27:58.890 --> 28:01.910
budget for that. Otherwise you start out with a company 5 people,

28:02.410 --> 28:06.390
10 people and you it grows organically and you end

28:06.890 --> 28:10.350
up with this spaghetti and old servers standing there waiting

28:10.850 --> 28:14.230
to be shut down. Which never happens until you know, somebody comes up and

28:14.730 --> 28:18.110
cleans up or until something goes really really bad as this

28:18.610 --> 28:22.070
was the case here. If I want to blame not

28:22.570 --> 28:25.470
blame, but if I would have been the guy writing the report here,

28:25.550 --> 28:28.750
question would be okay, why hasn't this been noticed?

28:29.250 --> 28:32.930
Why didn' the telco that this subcontractor was working for perform an

28:33.430 --> 28:37.410
audit of its subcontractors IT systems to ensure that their other subcontractor

28:37.910 --> 28:40.930
had their application available when it needed to be. And you find that out the

28:41.430 --> 28:44.610
moment you do a site visit and you see your data room guarded

28:45.110 --> 28:48.970
by Linda, the key master. That is one big red flag that

28:49.470 --> 28:53.450
needs to be addressed. And that didn't happen or wasn't picked up. In my

28:53.610 --> 28:57.530
experience, what usually happens here, if your quote unquote data center

28:58.030 --> 29:01.890
is a glorified storage room with a matching broomstick and bucket

29:02.390 --> 29:05.490
and it can be opened with a key from a filing cabinet. That's straight from

29:05.990 --> 29:09.450
the 90s. We've had in was it episode five,

29:09.950 --> 29:12.890
Paris rooftops, we had network going out because.

29:13.390 --> 29:16.450
Exactly. In the maintenance cabinet, the cleaning crew would

29:16.950 --> 29:20.450
unplug the switch for the mopping machine of the floor and then plug it back

29:20.950 --> 29:24.450
in. So, yes, this is what happens.

29:24.950 --> 29:29.090
Unfortunately, that's what you're saying now is something that I've heard

29:29.490 --> 29:32.770
way too much before. So wait until you've lived through it. Yeah,

29:32.850 --> 29:36.610
a few times. For the love of God, don't host production

29:37.110 --> 29:40.730
applications on a Raspberry PI. I mean, like you said, they're good for like monitoring

29:41.230 --> 29:45.210
and stuff like that. And I actually use one at home for a headless

29:45.710 --> 29:49.820
server that I'm running, which I use as a VNC server or if it's completely

29:50.320 --> 29:53.940
down or like a jump host. Yeah, exactly. So for

29:54.440 --> 29:58.140
those things, perfectly fine, even for things like home automation,

29:58.640 --> 30:01.860
but that's in a home setting. I've got a few Raspberry PIs left and right

30:02.360 --> 30:05.700
on customer locations where they're being exactly used as

30:06.200 --> 30:09.900
a jump host with a VPN server on it so that if need be,

30:10.400 --> 30:13.780
it will save you a two hour drive to walk in and restart the

30:14.280 --> 30:18.140
service. Yeah, so yes, that's what they're ideal for. And give your guys

30:18.640 --> 30:22.140
a checklist once every month, log in, do get update so that

30:22.640 --> 30:25.980
everything is patched and everything. And then you're okay. Of course, not in

30:26.480 --> 30:29.820
enterprises, but small businesses, no problem. It works. Everything is better

30:30.320 --> 30:33.940
than a mini tower case under a desk. Yeah, of Linda.

30:34.180 --> 30:37.660
Exactly. Good, good. Filip and Linda,

30:38.160 --> 30:41.760
thank you for your input and sharing everything here. Yeah, you're very welcome.

30:42.260 --> 30:46.000
Absolutely. A pleasure having you here. So this has been it, Horror stories

30:46.500 --> 30:49.960
with Jack Smith. As always, you can find us on Spotify, Apple Music,

30:50.460 --> 30:53.800
YouTube, Deezer and wherever you find your podcast. New episodes

30:54.300 --> 30:58.560
at least once a month. We have merchandise. You can find the merch@shop.it

30:59.060 --> 31:02.480
horrorstories, EU or support us via Ko Fi

31:02.980 --> 31:06.880
if you are in the US you can change your shop region. At the bottom

31:07.170 --> 31:10.250
of the page we have social media, Instagram, TikTok,

31:10.750 --> 31:14.050
Facebook, LinkedIn, Blue sky and Mastodon and all

31:14.550 --> 31:18.450
other links are on our website. IT Horror Stories eu.

31:18.610 --> 31:22.250
Thank you for tuning in. Until next time. And don't forget you're

31:22.750 --> 31:23.570
one of us. Bye bye.

31:29.650 --> 31:33.090
The content of this podcast is intended for entertainment purposes

31:33.590 --> 31:37.600
only and is meant to humorously explore various tech related situations.

31:37.840 --> 31:41.440
Any resemblance to actual events or real persons, living or

31:41.940 --> 31:45.280
dead, is purely coincidental. We ridicule situations,

31:45.440 --> 31:48.640
never individuals or groups. Listener discretion is

31:49.140 --> 31:52.480
advised and we encourage everyone to approach technology with a sense

31:52.980 --> 31:54.080
of humor and an open mind.
