Feed on
Posts
Comments

If you find yourself with some Cisco IR809G and probably other Cisco IR800-series industrial integrated services routers from eBay, you probably find yourself needing the DC power plug/connector. It took me a while to find these as the Cisco part number is useless in the 3rd party world.

Cisco IR809G power terminals

Mouser lists them as “Pluggable Terminal Blocks 3.81mm euro plug 4 position” made by Molex, Mouser P/N 538-39519-0007. Which is where I got mine for a $2.38 as of 2024. I guess the pin pitch is the important thing. I was only familiar with 5/5.08mm Phoenix connectors where there was no septum thingy between pins, whereas 3.5/3.81mm the pins are enclosed individually. Also TIL Phoenix and Europlug and Euroblock connectors are all the same thing depending who you ask and who makes them.

Radwell lists them as the 29-6115-01-A0 for $16, ooof. Which is still better than the dude I saw selling them on eBay for $45.00 each!

I’ve been tinkering around with other vintage projects the last several months. Many of the photos land over on Flickr but I’m pretty sure the googles never index them, so they’re largely undiscovered and I haven’t written widely about them and why I care. So here’s my “life story before the recipe” list.

 

Logitech ScanMan 32

Trying out the ScanMan 32 on my Macintosh IIsi

[photos – flickr: 2024-10 Logitech Scanman32]

This is a hand-held greyscale (32 shades of grey!) scanner that came out in 1992-ish. You had to move it down a page by hand, could only scan about 5″ wide, and only up to 400 DPI. Dad bought one for some reason and I don’t really remember the reason, either he wanted to OCR scan books or notes and it didn’t work like he expected, or he just saw it at Sam’s and wanted it to tinker with. At the time we had a 286 computer with a CGA display and while it worked with the computer, it didn’t work all that well. OCR was done by a DOS program called Catchword. It was the first time I had ever used OCR and it seemed somewhat magical, it still left a lot of hand editing to clean up text.

Graphics and photo scanning was rough, at 300-400 DPI this was greatly more resolution than what the CGA monitor could display. Something just an inch or two wide could easily spill off the side of the screen, and it was a lot of work to attempt to stitch together a single page. Eventually a few years later when I had a faster computer with a VGA monitor I put it to use scanning various clipart and had my own little library of .PCX files I made computer catalogs with. It was quite satisfying to embed a filename in WordPerfect 5.1 for DOS (this was before WYSIWYG) and see a rudimentary photo come off the dot matrix printer.

 

A couple of summers ago I found the thing in a box in the barn where it had been sitting probably for 20 years. It looked like it was in good condition so I was curious if it still worked. It didn’t have the ISA interface card so I wound up eBaying one. Eventually this year I finally got around to trying to hook it up to my 486. I installed an old DOS version of Logitech GrayTouch 1.0 and discovered the thing indeed still worked!

I quickly realized old versions of the software that went with this are missing and hard to find. The only version of GrayTouch I could find was on archive.org, the Dutch version at that. From my old tape backups I had GrayTouch 2.0 but not the whole installer set. I found somebody on eBay who was selling it — at $15 a disk and it was a 3-4 disk installation.

Settings for the ISA card are almost lost too. None of the manuals are on Archive.org. I found one page “Trevor’s Unofficial Q&A Page – Logitech Software FAQs” on archive.org that had the DIP switch settings for the ISA interface card. These set the I/O base address and the software has to know about it.

Logitech Scanman Plus ISA Controller Board 200074 DIP switches

Scanman Plus ISA DIP switches board 200074

While in the process of researching how to get this thing going again, I found out there was a Macintosh version of it. Specifically, a SCSI interface box that hooked up to the same ScanMan 32 and let you plug it into the SCSI port of a classic Macintosh. Of course now that I have a SE and IIsi, I had to try it out. I found the box, “H7M-1” on eBay for $20 and decided to try it out. It also included another ScanMan 32 and Mac manuals. So now I have two of these damn things.

The Mac software was a little easier to find and was fairly straightforward getting running.

I have manuals for Logitech PaintShow Plus, ScanMan Mac, and ScanMan Plus, I need to get them fixed up and uploaded to archive.org. One looks like it was wet, but otherwise legible. If I run across a good version of the DOS manuals and software, I for sure want to nab it to scan and archive.

I’ve pondered about making a video of this scanner in action on PC and Mac, there’s not many about it. Incidentally Cathode Ray Dude came out with a video about scanners and briefly mentions the Logitech ScanMan, so maybe that’s all the coverage it needs.

Harris TS22ALO butt-set

[photos: flickr – 2024-10 Harris TS22ALO repair]

Untitled

Ever since the ISP days I wanted a butt-set to test phone connections with. They were a few hundred dollars so I never bought one and instead carried around a $12 princess phone. I recently decided I’m an adult and I can buy one if I want to! I bought this on eBay for like $25, “not working, parts only”. Doing a little reading revealed these things have two batteries. One is the normal 9 volt battery that provides working voltage, another is a CR2032 battery that serves internal functions and if it dies the unit is inoperable.

There’s like one dude on YouTube that has videos of repairing these things and he completely skips over the part of how to actually take them apart. It’s very much draw the rest of the fucking owl. I took several photos along the way so that’ll have to do in lieu of my own teardown video.

The goal is to open the thing up and replace the CR2023 battery that’s inside it. The problem is these handsets are designed to be dropped off a 20′ telephone pole into a lake and survive, so they’re very ruggedized. The entire PCB and all the components, including the battery, are coated in this thick, goopy, rubber-ish plastic coating that’s a complete pain in the ass to get off. It looks like hot glue and you can take little nibbles with needle nose pliers, but it’s not hot glue and you can’t melt it. Cutting it dulls blades pretty fast too. I don’t know if there’s anything like acetone or gasoline that might dissolve it. If only YouTube repair guy would tell us his secrets.

I eventually starting slicing around the battery with a box cutter, very much cutting away from my fingers. I would get a slab of it and start twisting it over my needlenose pliers and eventually was able to peel away the stuff chunk by chunk to get the battery exposed.

The battery holder is spot welded directly to the battery so it’ll have to be replaced. There’s three points of contact with the PCB, two posts on the edge (positive), and one underneath (negative).

Negative lug cut on battery

I bought some cheap CR2032 enclosures from Amazon, wired up a jumper so both positive contacts on the PCB made connection plus the negative.

Put it all back together and it works!  The audio is kind of scratchy, I don’t know if parts have drifted out of spec or it was always like this. But it sure is loud and won’t have any problems hearing it in a machine room!

DHCP OFFER with both lpxelinux.0 and grubx64.efi boot-file-names

TIL a DHCPv4 server can respond with two different TFTP boot-file-names in a single DHCPOFFER packet. And how the second filename can get corrupted with extra junk that shows up as a PXE client trying to download a slightly wrong file from your TFTP server.

TFTP request with 0xFF at the end of the filename

The latter I’ve seen before but I don’t think I actually dug into trying to figure it out. Again, more interesting stuff I’ve uncovered switching from ISC DHCP to ISC Kea. Here I will try to explain where the mangled TFTP filename came from and how to avoid it.

I was trying a DHCPv4 server configuration to support both UEFI PXE clients and some old legacy BIOS-based motherboards. In old ISC DHCP this is usually done with a class to match on the vendor class or the processor architecture (code 93). If it’s 0x00 0x07, return in the DHCP OFFER a file-name of a UEFI network boot program such as syslinux.efi or bootx64.efi, else return a file-name of something like lpxelinux.0:

# ISC DHCP
class "pxeclients" {
  match if substring (option vendor-class-identifier, 0, 9) = "PXEClient";

  if option arch = 00:07 {
    filename "/efi64/syslinux.efi";
  } else {
    # PXELINUX >= 5.X is the new hotness with HTTP/FTP
    filename "/bios/lpxelinux.0";
  }
}

I was trying to do this same thing over in ISC Kea using a client-class:

  "Dhcp4": {
  ...
    "boot-file-name": "/bios/lpxelinux.0",
    "next-server": "192.168.130.10",
  ...
    "client-classes": [
      {
        "name": "grubx64_efi",
        "test": "option[61].hex == 0x0007",
        "option-data": [
          {
            "name": "boot-file-name",
            "data": "/efi/grubx64.efi"
          }
    ...
  ...

Except when I tried to UEFI PXE boot my system over IPv4, two unexpected things happened:

TFTP request with 0xFF at the end of the filename

Wireshark from the tftp server showing the request filename

First, the UEFI TFTP client was asking for a filename with extra characters (0xFF) at the end. This showed up in both syslog for the tftp server as well as a packet capture on the tftp server showing the extra 0xFF at the end. Others on the internet have mentioned other termination characters such as unicode U+FFFD. This was causing PXE booting to fail because the target system couldn’t fetch the bootloader program. In this case I’m still testing with a SuperMicro A1SAi motherboard as prior posts.

Second, when I ran packet captures to verify the filename being sent in the DHCP OFFER to make sure it wasn’t garbage, there were TWO boot filenames being returned in two different spots in the same packet! Both my /bios/lpxelinux.0 and /efi/grubx64.efi paths were being offered. wtf?

I started searching around and found these two enlightened threads on the Mikrotik forums and on the Ubiquiti forums that addressed my weird filename format. Others have seen this behavior too, and it shed some light on the problem. It comes down to if the boot file-name was included as an option (this part is key, in this case option 67) then UEFI PXE TFTP implementations expecting it to be a null-terminated string like, whereas the DHCP server terminated the field with an end-of-options flag of 0xFF. In other words, the UEFI should be respecting the data length field and terminating the string appropriately and not read too-many bytes.

Thus what I was seeing was the UEFI reading beyond the expected end of the filename, including the marker and then trying to TFTP request the file “grubx64.efi<FF>”.

This got me into reading up on the format of DHCP OFFER packets and I discovered the second issue. In RFC2131, DHCP OFFER headers have fix-length fields for “siaddr“, the “next-server” or TFTP server IP address, “sname“, an optional server hostname, and “file“, a 128-byte field that holds a boot filename. These fields are null-terminated.

RFC2131 DHCP format

HOWEVER, in RFC2132 which lays out the various DHCP options that can be specified we get to option 67. This specifies a DHCP Option “is used to identify a bootfile when the ‘file’ field in the DHCP header has been used for DHCP options.” Here the raw format is 0x67 + the length of the filename + filename. Note the lack of null termination used.

The way I read the RFC this says the TFTP filename can either be in the original DHCP OFFER header, a/k/a the “fixed fields” or specified later as an variable-length DHCP option, but not both at the same time.

This seems to be a source of a lot of confusion for people trying to troubleshoot their PXE boot configurations. It seems many like myself do not know there are two fields and keep hammering away fiddling with filenames and it’s not clear which one they’re setting.

Bonus: see below when I try to add on some dummy Option 68 data, still breaks

This got me back to reading the Kea docs again to find out what was wrong with my configuration. I caught on to the fact I was using a global “boot-file-name” and then specifying “boot-file-name” again as option 67 in my client-class.

They configuration options in Kea are literally named same thing and should be the same thing, right? RIGHT??

No, it turns out buried in 8.2.18.1 Setting Fixed Fields in Classification they are very much different. It turns out in order to set the boot-file-name set in the OFFER header, I needed to ditch the options-data and re-set “boot-file-name” again in the right scope like this:

  "Dhcp4": {
  ...
    "boot-file-name": "/bios/lpxelinux.0",
    "next-server": "192.168.130.10",
  ...
    "client-classes": [
      {
        "name": "grubx64_efi",
        "test": "option[61].hex == 0x0007",
        "boot-file-name": /efi/grubx64.efi"     <<< note not in an option-data block
      }
    ...
  ...

I guess technically if the header was full then it would make sense to call this field the same name since it should serve the same purpose.

Also for whatever reason the examples in the Kea documentation mention things like "boot-file-name": "/dev/null" which might lead you to believe this leaves the field empty. But no, it quite literally sends the string /dev/null as the filename sent to the target server in the DHCPOFFER.

Winning!

This gets us back to returning a single TFTP boot file-name in the first part of the DHCP OFFER packet, it’s null-terminated, and when the target system UEFI PXE boots, it’s requesting a valid filename. And in this case the client-class test does the right thing, it detects the target system is UEFI and sends the /efi/grubx64.efi boot-file-name instead of /bios/lpxelinux.0.  Winning!

Wireshark of DHCP OFFER with only grubx64.efi

and here’s the happy server:

>>Checking Media Presence......
>>Media Present......
>>Start PXE over IPv4.
  Station IP address is 192.168.135.28

  Server IP address is 192.168.130.10
  NBP filename is /efi/grubx64.efi
  NBP filesize is 2541096 Bytes

>>Checking Media Presence......
>>Media Present......
 Downloading NBP file...

  Succeed to download NBP file.

 

But why?

While this fixes my problem, it doesn’t address the seeming impedance mismatch between what DHCP RFCs say how the filename is specified and why UEFI seems to do its own thing by tacking on extra characters such as 0xFF. Surely these two standards groups must talk to each other?

Cracking open the UEFI 2.6 Specification, my favorite reading as of late, it’s mentioned in “Network Protocols – ARP, DHCP, DNS, HTTP and REST”. Here in EFI_DHCP4_HEADER it mentions BootFileName[128]. Then right after in EFI_DHCP4_PACKET_OPTION it clearly mentions the format of “option code + length of option data + option data”. So the format of options as mentioned in RFC2131/2132 is acknowledged here. But it really doesn’t mention line terminations, and I assume that’s left as an implementation detail.

PXE Specification doesn’t really mention line terminations either.

RFC2132 clearly states that we shouldn’t be adding our own null termination in DHCP Options. That is, we shouldn’t be trying to set boot-file-name to something like “/efi/grubx64.efi\0” in attempt to trick the UEFI into using the “correct” filename.

Options containing NVT ASCII data SHOULD NOT include a trailing NULL; however, the receiver of such options MUST be prepared to delete trailing nulls if they exist. The receiver MUST NOT require that a trailing null be included in the data. In the case of some variable-length options the length field is a constant but must still be specified.

The open source UEFI reference implementation, Tianocore EDK II, takes the stance RFC2132 says it’s not guaranteed to be null terminated, which seems to conflict with this paragraph that says the option shouldn’t ever be null terminated to begin with. In any case, they take the boot-file-name from the DHCP OFFER and if it’s Option 67 they use the length of the string to null-terminate it, else if it’s the fixed-field just use it directly: (NetworkPkg/UefiPxeBcDxe/PxeBcDhcp4.c)

  //
  // Parse PXE boot file name:
  // According to PXE spec, boot file name should be read from DHCP option 67 (bootfile name) if present.
  // Otherwise, read from boot file field in DHCP header.
  //
  if (Options[PXEBC_DHCP4_TAG_INDEX_BOOTFILE] != NULL) {
    //
    // RFC 2132, Section 9.5 does not strictly state Bootfile name (option 67) is null
    // terminated string. So force to append null terminated character at the end of string.
    //
    Ptr8  =  (UINT8 *)&Options[PXEBC_DHCP4_TAG_INDEX_BOOTFILE]->Data[0];
    Ptr8 += Options[PXEBC_DHCP4_TAG_INDEX_BOOTFILE]->Length;
    if (*(Ptr8 - 1) != '\0') {
      *Ptr8 = '\0';
    }
  } else if (!FileFieldOverloaded && (Offer->Dhcp4.Header.BootFileName[0] != 0)) {
    //
    // If the bootfile is not present and bootfilename is present in DHCPv4 packet, just parse it.
    // Do not count dhcp option header here, or else will destroy the serverhostname.
    //
    Options[PXEBC_DHCP4_TAG_INDEX_BOOTFILE] = (EFI_DHCP4_PACKET_OPTION *)
                                              (&Offer->Dhcp4.Header.BootFileName[0] -
                                               OFFSET_OF (EFI_DHCP4_PACKET_OPTION, Data[0]));
  }

 

So at least their implementation does the right thing as far as we’re concerned, and not feeding tailing characters to the TFTP server.

I know this isn’t the case of the UEFI on my SuperMicro test motherboard. If there’s a boot file name in option 67, it’s gonna get screwed up.

Notably this doesn’t seem to be a problem with DHCPv6 and PXE booting as DHCPv6 doesn’t use the same sort of fixed fields in DHCP ADVERTISE messages.

Further testing overruns with option 68

12/9: I wondered what happened if I added yet another option to my OFFERs that was right after Option 67. Would the UEFI loader figure out where to stop trying to read option 67, or would it keep reading beyond the end of the field? I configured Kea to send option 68, for “Mobile IP Home Agent”. The name and purpose doesn’t matter, I just wanted the next numerical option so the data would be adjacent in the packet.

Here’s what the new OFFER looks like with some dummy option 68 data:

DHCP OFFER with filename, Option 67, and Option 68

and here’s the hex representation of it in the packet:

Hex payload of Option 67 and 68

We have 0x43 (Decimal 67 for option 67), length 12, “testfilename”. Then immediately after we have 0x44 (Decimal 68 for option 68), length 4, followed by bytes of an IPv4 address c0-01-02-03 (192.1.2.3), and finally our 0xFF end terminator.

What does the Supermicro UEFI TFTP client do? It surprisingly reads beyond the end of option 67 and keeps going and using option 68 data as the TFTP boot-file-name! All the way to the end of the DHCP packet again, including the 0xFF terminator.

UEFI reading both option 67 and 68 data for boot-file-name!

This shows up in the TFTP server log as the original “testfilename” and then the ASCII representation of the option 68 data.

Conclusions and workarounds

The TFTP filename getting stuff appended to the end seems to be yet another UEFI implementation bug as others on the internets claim. It would seem if you’re having this problem, your best bet is to avoid using DHCP Option 67 and work to configure your DHCP server so your boot-file option is being set in the DHCP OFFER header directly. In ISC DHCP this seems to be the plain “filename” directive. In ISC Kea, it’s the top-level “boot-file-name” as mentioned above. In dnsmasq (I haven’t personally tested this) it seems to be the “dhcp-boot” directive.

The Windows DHCP server seems to be a big source of confusion. Practically every example I find for Windows Deployment Services says to use Option 67, I’m not even sure if there’s a way to set the field in the header. I don’t have a Windows server handy to look at for reference.

The only advantage I can see to using Option 67 over the fixed-field name is that the fixed-field name is limited to 127 bytes, whereas Option 67 allows up to 255 bytes.

Another option is to UEFI PXE boot over IPv6 which avoids this problem altogether.

There’s certainly some clever workarounds out there such as making symlinks on the TFTP server so that for example “grubx64.efi<FF>” links to “grub64.efi”. While that may work it seems too hackish even for me.

There may be a possibility of other UEFI things out there that need to chain boot and explicitly want Option 67. I don’t know offhand what those could be, but anyone can do anything in software.

Links

  • https://forum.mikrotik.com/viewtopic.php?t=58039
  • https://community.ui.com/questions/Network-Boot-adding-characters-to-file-name/cffe7862-dbc7-42e8-bb09-1ef3366fef9c
  • EDK II reference: https://github.com/tianocore/edk2/blob/master/NetworkPkg/UefiPxeBcDxe/PxeBcDhcp4.c
  • UEFI 2.6 Specification: https://uefi.org/sites/default/files/resources/UEFI%20Spec%202_6.pdf

										
				

Years behind schedule I finally got around to replacing ISC DHCP with Kea DHCP so I could finally have proper IPv6 host reservations. What I just learned, and should have learned years ago, that several of my motherboards such as the Supermicro A1SAi and Intel NUC while they support UEFI PXE booting, they do not support TFTP servers outside of their local /64 network. Doh! They will happily get an address via DHCPv6 on a DHCPv6 server on another network via a relay, that’s not a problem, but if the TFTP server is not on the same LAN the NBP download process times out and fails. It would seem that Linkedin learned this years ago too. This is similar in effect to my misconfigured DHCP server the other day, but not the same cause.

The only solution is to either have a TFTP server on the same LAN as the target system, or keeping around legacy IPv4 networking so that the target system can use UEFI IPV4 PXE to boot something like syslinux.efi, or GRUB2, or iPXE, which in turn has IPv6 support, and can finish downloading the kernel and initramfs over IPv6.

At first I thought I was doing something wrong in Kea (and I verified this with the old ISC DHCP), but no, packet captures prove that during UEFI PXE boot the system is making zero effort to send out Router Solicitations. It also tries to do Neighbor Discovery for IPv6 addresses that it should be sending to the default gateway, which implies it’s not honoring Router Advertisements that tell the system its prefix and prefix length. Or, it has some wild ideas as what it thinks are “on-link”, which is how IPv6 determines if something is on the same L2 network.

An example

Here’s a target system, Supermicro A1SAi-2550 with MAC address 0c:c4:7a:32:27:6, trying to UEFI PXE boot over IPv6:

First, the Kea DHCP6 server configuration, just says here your IP address is 2001:470:8122:1::9, and go fetch grub2 using tftp at 2001:470:1f05:2c9::10:

    "client-classes": [
      {
        "name": "grub2_tftp_efi",
        "test": "option[61].hex == 0x0007",
        "option-data": [
          {
            "name": "bootfile-url",
            "data": "tftp://[2001:470:1f05:2c9::10]/efi/bootx64.efi"
          }
    ...
    ...
    "subnet6": [
    ...
    ...
    "hostname": "basic09.wann.net",
                  "hw-address": "0c:c4:7a:32:27:6c",
                  "ip-addresses": [ "2001:470:8122:1::9" ],
                  "client-classes": [ "ikeacluster" ]
    ...

On boot, this is displayed on console:

>>Checking Media Presence......
>>Media Present......
>>Start PXE over IPv6..
  Station IP address is 2001:470:8122:1:0:0:0:9

  ....long 20 second wait...

  Server IP address is 2001:470:1F05:2C9:0:0:0:10
  NBP filename is efi/bootx64.efi
  NBP filesize is 0 Bytes
  PXE-E18: Server response timeout.

This tells us the target system did a successful DHCPv6 Solicit/Advertise/Request/Reply (S.A.R.R.) to Kea, it understood the bootname-url option in the DHCP6 response. But then got zero bytes.

From the standpoint of the DHCPv6 and TFTP servers, there’s not much to see. The SARR process happens, and that’s it. Nothing tries to hit the tftp server at all.

From a packet capture of the router (:89:f0) facing the Supermicro system (:27:6c) we see:

Solicit XID: 0xe8a7e3 CID: 000100013875b6020cc47a32276c                               ok
Advertise XID: 0xe8a7e3 CID: 000100013875b6020cc47a32276c IAA: 2001:470:8122:1::9     ok
Request XID: Oxe9a7e3 CID: 000100013875b6020cc47a32276c IAA: 2001:470:8122:1: :9      ok
Reply XID: 0xe9a7e3 CID: 000100013875b6020cc47a32276c IAA: 2001:470:8122:1::9         ok
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:6c                what!
Neighbor Solicitation for fe80::ec4:7aff:fe32:276c from fc:ec:da:4a:89:f0             < router :f0 asks who :6c is
Neighbor Advertisement fe80::ec4:7aff:fe32:276c (sol, ovr) is at 0c:c4:7a:32:27:6c    < :6c replies
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:6c                what!
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60                what!
Neighbor Solicitation for 2001:470:1f05:2c9::10 from 0c: c4:7a:32:27:6c
Neighbor Solicitation for fe80::feec:daff:fe4a:89f0 from Oc:c4:7a:32:27:6c            < :6c unicast-asks who's :89:f0
Neighbor Advertisement fe80::feec:daff:fe4a:89f0 (rtr, sol)                           < :f0 replies I am he, also I'm a router
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from 0c:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from 0c:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:C4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:C4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60
Release XID: Oxeaa7e3 CID: 000100013875b6020cc47a32276c IAA: 2001:470:8122:1::9    < :6c I give up
Reply XID: Oxeaa7e3 CID: 000100013875b6020cc47a32276c

We see the Supermicro go through the whole SARR process. DHCP6 by design does not carry any router details or subnet/prefix information design. It’s up to the target system to listen for Router Advertisements to find the prefix of the associated subnet of the LAN. In other words, the Supermicro assumes it is 2001:470:8122:1::9/128 until something tells it otherwise. Here the Supermicro did not make any sort of Router Solicitation. I’ve filtered it out for brevity, but the router was indeed sending out RAs every 4 seconds so it had ample time and had at least 5 go by in this time frame.

My hypothesis is that maybe the IP stack did receive an RA but just decided everything is “on-link” anyways? Or has a wildly wrong prefix misconfigured and thinks everything in the world is on the same network. For giggles I did try to fetch from a Comcast 2601:646:: network so it’s in at least a different /14, didn’t help. In any event, the Supermicro starts sending out Neighbor Discovery requests for the TFTP server at 2001:470:1f05:2c9::10 over and over, which is a completely different subnet on a completely different LAN.

It tries this for many seconds and eventually gives up. It’s nice enough to release the DHCPv6 lease before it returns to the boot menu.

How to fix?

I don’t know if there is a fix for this, at least one available to me. I’ve already tried upgrading the Supermicro BIOS which jumped it way ahead from a 2014 vintage to 2019. I’m sure Supermicro’s solution is “buy something newer”.

In the meantime I’m going to go back to booting GRUB2 over IPv4 and be mad about it.

A peek inside PXE – TianoCore EDK II

Googling for anything related to PXE booting is futile. Pages and pages of people way off the mark and no real definitive information. The UEFI 2.1 and 2.7 Implementation specifications are useful, they go into a lot of detail as to what should happen, but it’s up to others to actually write the code. Somehow I did stumble upon TianoCore EDK II, (EFI Development Kit II) from what I gather was Intel’s original EFI reference code that was open sourced and now has grown into its own reference UEFI codebase. TianoCore is the community, EDK II is the reference implementation.

I have no idea if Supermicro’s UEFI code is based off of EDK, it seems fairly sorta similar from what little I can see at least. Maybe not, because EDK supports UEFI HTTP boot and my Supermicro doesn’t. I give the Intel NUC more possibility that it could be using code from the same pedigree.

EDK II code is a fascinating read, especially the NetworkPkg/UefiPxeBcDxe code that shows an actual PXE implementation, “Start PXE over IPv6” and all. It answers a few questions, such as what’s the real format for bootfile-url options in DHCP (tftp://ip.address./path/path/file), or why the leading / slash gets chopped off paths, or the variety of code paths that get you to different PXE-Exx error codes.

Another cute thing I learned from the EDK code, and I’ve seen it on the NUC, is that every dot it prints after “Start PXE over IPv6” means the stack has sent a packet on the network.

[photos: flickr – Vintage dial-up modem teardowns]

[photos: flickr – Analog telephone adapters]

For several months I’ve been buying old popular models of dial-up modems from the 1990s to test how they fare over VoIP connections along with different analog telephone adapters. To my great annoyance maybe a quarter of them didn’t include an AC power adapter, so I had to do a bunch of sleuthing to figure out if the modem took AC or DC power, the voltage, the expected amperage, what type and size of power connector. What worked for one model is no guarantee it works for another similar one.

USR Courier I-modem AC transformer guts

For instance even between my USR Courier V.Everything modems, models 1868, 2806, 3453C, they came with AC step-down transformers that output 20 VAC, 9 VAC, or 15 VAC. The USR Courier I-modem AC adapter claims it has a 20 VAC output, but after getting weird output measurements on the pins, I cut open the the impossible-to-find AC transformer to find it has a diode which seems to imply it’s outputting half-wave rectified DC-ish power and a much easier to find DC-only supply might work.

It looks like Retro Web doesn’t allow for documentation of external devices like modems, there’s no good collection of this information that I’m aware of. To help future generations avoid this problem, I started photographing and noting the details of every power supply in my collection. And for history’s sake I decided to open up the modems and make high-quality-ish photos of them too. Hopefully this will let people find cheap replacements for modems they buy or in the case of the Courier I-modem, find a workaround replacement because they are very rare.

At least one, such as the first gen USR Courier I-modem, had leaking electrolytic capacitors so I’ve taken extra photos of the caps to get size information. Unfortunately I am not yet an expert on circuit design, DSPs, and ROMs, so I don’t have much illuminating commentary or stories to tell about these modems.

For now I have all the teardown photos in a single, large Flickr album, organized by modem name/model.

I haven’t decided how I want to organize these, if I want to put together a modem wiki over on Tuxedocatbbs.com, or go for a more structured approach like Retro Web did. I have more information that goes along with them, either manuals I’ve scanned or dug up, replacement capacitor sizing, along with init strings used during my testing.

As for the testing itself, that’s a whole ‘nother post. I used Qmodem on my 486 to make thousands of calls to my BBS and do a 64 KB Ymodem download. For actually calling, handshaking, and connecting, surprisingly all of the modems have almost a 100% success rate over VoIP without any speed restrictions. Disabling V.92 quick connect is usually the only tweak I’ve had to make. However actually trying a download is where things start telling different stories and results vary widely. Preliminary test data and results are over on the BBS website: https://tuxedocatbbs.com/stats/ccr.txt

As of 11/2024 I have these modems up:

  • Cardinal 28.8k V.34 external 020-0458
  • Hayes Smartmodem Optima 9600 “Optima 96” 2003 AM
  • Hayes Smartmodem 2400
  • Hayes Smartmodem Optima 288
  • Motorola ModemSURFR 33.6
  • Motorola Premier 33.6
  • MultiTech MultiModem II MT1432BA
  • MultiTech MultiModem II MT2834BA
  • MultiTech Multimodem MT5634ZBA
  • SupraFAXmodem 144 LC
  • SupraFAXmodem 288
  • SupraFAXmodemPlus 2400
  • Telebit Netblazer PN V.32bis
  • US Robotics 56k V.90/x2 (basically Sportster)
  • US Robotics USR5637 USB
  • US Robotics Courier 56k Business Modem 3453C
  • US Robotics Courier I-modem ISDN with V.Everything
  • US Robotics Courier I-modem with ISDN/V.34
  • US Robotics Courier V.Everything 1868
  • US Robotics Courier V.Everything 2806
  • US Robotics Sportster 56k with x2
  • Viva 9600/4800 2400 bps data fax
  • Zoom VFX V.32bis

Have you seen this modem?

Wang 9648/24e

One of my very first modems was a Wang 9648/24e, a 2400bps fax/modem that I bought at Walmart around 1993. I have only found exactly one photograph of this model on the Internet.  Barely anyone seems to remember Wang, much less that Wang made modems. It wasn’t particularly good nor bad, just a pokey 2400. I even used it for years during the ISP for credit card batch processing because higher speed modems had problems connecting to the processor. I tossed mine years and years ago, but if you come across one send it to me! I thought the Viva 9600/4800 was a rebranded version but after buying one it only looks vaguely similar and is most definitely nowhere near the same thing.

Update: 9:17 PM

Literally hours after I posted these, one just sold on eBay three hours ago! I’ve been keeping an eye out for it but guess I didn’t have a saved search for it.

Update 9:20 Oh it’s actually a 9696/24e which I’ve actually never heard of and looks slightly larger, so not exactly the same, but still so close!

 

Petcube Bites 2.0 teardown

This is my second Petcube Bites and after a few years of operation it stopped dispensing treats. Treats started getting jammed between the rotating loader head thingy and the slot loading to the launcher chute and I’d have to empty it out and pick out the offending treat, only to have it jam the next time around. Using the little reducer didn’t seem to matter. The unit would growl and whirrrrr for 10-15 seconds before it timed out, it sounded like it had a stripped gear inside.

No other option other than throwing it away, I opened it up to take a look:

2024-11 Petcube Bites 2 Teardown

Go to Flickr gallery

[photos – flickr: Petcube Bites 2.0 teardown]

The unit was designed simpler than the 1st generation Petcube Bites. That one had a spring loaded flipper thing that I seem to recall just stopped working and I couldn’t fix it after I took it apart too. The 2.0 just has two motors, one for the “loader” at the top and another for the “launcher” on the side. I knew the 2.0 would launch treats with some force across my apartment, after opening it up I found out why. The launcher motor spins up at a pretty good clip the whole time while waiting for the loader to feed in treats, then turns off. The whirrrrring sound I heard seems to be the launcher motor running empty until it times out.

The first gen had sensors in the launcher chute which I assume is to tell if a treat dropped or not. One the 2nd gen both motors have a wheel that passes through opto-interrupters, which I’m wondering measures slight changes in RPM to figure out if a treat has been fed through.

Update 11/27/2024:

It’s jamming again. After scooping the treats out of the way and looking at the feeder mechanism, I think what is happening is that a treat is being plucked by the rotating head but is hanging out on the ledge of the chute. Instead of falling in it just hangs there and the motor keeps trying to crush it. I thought I’d seen the motor change direction before, but in this case it keeps twisting and twisting until it times out.

If it would just back off and reverse part of a turn to let the stuck treat drop, I think that would fix this.

 

Here’s one for the future troubleshooting seekers. I was testing IPv4 UEFI PXE booting a Supermicro A1SAi motherboard after applying the Atom 2550 fix and couldn’t get the thing to load the network bootstrap program (NBP). I’m not at all saying this is the only reason for hitting a PXE-E99 error, this is just what I hit today.

This blipped by on VGA console so fast I had to use slo-mo on my phone to capture it. (With the Atom 2550 fix, console redirection is lost, so no serial console scrollback).

Checking Media Presence.....
Media Present....
Start PXE over IPv4. Press ESC key to abort PXE boot.
Station IP address is 192.168.135.29

Server IP address 192.168.130.10
NBP filename is /efi64/syslinux.efi
NBP filesize is 0 Bytes
PXE-E99: Unexpected network error.

My PXE environment is pretty set it stone, it’s configuration managed and doesn’t get changed willy-nilly.

Things that came to mind:

  • Wrong EFI binary? Did this particular firmware want some weird 32-bit EFI program? Possible, but I had an near identical motherboard with an older firmware that loaded the x86-64 just fine. Plus I’m near 100% certain I’ve used this same binary on this same motherboard before it died.
  •  TFTP server broken? No, I was able to fetch syslinux.efi on several other machines on the same LAN just fine.
  •  UEFI not support routing / off-network TFTP server? This seemed feasible, but yet I’m absolutely certain I’ve PXE installed these on different LANs than the TFTP server.
  •  Does the UEFI firmware not like it when two DHCP servers respond? While both of my DHCP servers run identical configuration files and reservations, this was easy to test by stopping one temporarily. Didn’t help.

Tcpdumping on both the TFTP server and the router facing the LAN this motherboard was attached on showed there was absolutely no attempts on the wire made to fetch anything over TFTP.

The only thing left was DHCP. The “unexpected network error” got me digging into the DHCP responses being sent back:

01:35:21.075721 IP (tos 0x0, ttl 64, id 4266, offset 0, flags [DF], proto UDP (17), length 354)
    192.168.130.12.67 > 192.168.135.1.67: [bad udp cksum 0x8bbe -> 0xd3ef!] BOOTP/DHCP, Reply, length 326, hops 1, xid 0xafb187ee, Flags [Broadcast] (0x8000)
          Your-IP 192.168.135.29
          Server-IP 192.168.130.10
          Gateway-IP 192.168.135.1
          Client-Ethernet-Address 0c:c4:7a:32:27:e0
          file "/efi64/syslinux.efi"
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message Option 53, length 1: ACK
            Server-ID Option 54, length 4: 192.168.130.12
            Lease-Time Option 51, length 4: 86400
            Subnet-Mask Option 1, length 4: 255.255.255.0
            Default-Gateway Option 3, length 4: 192.168.130.1             <<<<<<<<<<<<<<<<<<<<<<
            Domain-Name-Server Option 6, length 12: 192.168.135.1,192.168.130.10,192.168.130.12
            Hostname Option 12, length 16: "basic10.wann.net"
            Domain-Name Option 15, length 8: "wann.net"
            BR Option 28, length 4: 192.168.130.255                       <<<<<<<<<<<<<<<<<<<<<<
            NTP Option 42, length 8: 192.168.130.10,192.168.130.12

After a while what stood out to me was that my DHCP server was returning a Default-Gateway of 192.168.130.1 in the RFC1497 Options section compared to the Gateway-IP/GIADDR set earlier in the packet. Also the broadcast address being set in the Options too.

It would seem that I have a misconfiguration in my ISC DHCP server somewhere. That’s what’s causing the wrong gateway to be returned, and makes sense in that the UEFI loader gets an address but can’t reach anything off the local subnet. Apparently all the previous times I’ve PXE booted these systems I’ve always used IPv6 and never hit this problem until I tested with IPv4. I hacked on my config to temporarily set the gateway manually to what it should be for the test host, and it PXE booted off the network just fine, using the TFTP server on the other LAN.

As to how my DHCP server configuration is wrong, I haven’t figured it out yet. I never put time to understanding how classes and groups were supposed to go, my config looked something like this:

option ntp-servers 192.168.130.10, 192.168.130.12;
option domain-name-servers 192.168.130.10, 192.168.130.12;

subnet 192.168.130.0 netmask 255.255.255.0 {
  next-server 192.168.130.1;
  option routers 192.168.130.1;

  host a {
     hardware ethernet ...
     fixed-address ...
  }
  host b {
    ...
  }
}

subnet 192.168.135.0 netmask 255.255.255.0 {
  next-server 192.168.135.1;
  option routers 192.168.130.1;

  host c {
     hardware ethernet ...
     fixed-address ...
  }
  host d {
    ...
  }
}

And in this example, for whatever reason booting host “D” would get the router from the other subnet. Every subnet example I’ve seen shows putting “options routers” in a subnet scope. I do have some some groups and classes in there that’s clearly fowling things up but I don’t see how.

From what I’ve been reading, class and host are top-level scopes and shouldn’t go inside subnet.

So I just re-wrote my DHCP configuration into what I believe now are the right scopes:

...
option ntp-servers 192.168.130.10, 192.168.130.12;
option domain-name-servers 192.168.130.10, 192.168.130.12;
next-server 192.168.130.10;

# For things that match this class, override global options
class "pxeclients" {
  match if substring (option vendor-class-identifier, 0, 9) = "PXEClient";

  if option arch = 00:07 {
    filename "/efi64/syslinux.efi";
  } else {
    # PXELINUX >= 5.X is the new hotness with HTTP/FTP
    filename "/bios/lpxelinux.0";
  }
}

subnet 192.168.130.0 netmask 255.255.255.0 {
  option routers 192.168.130.1;
  include "/etc/dhcp/homenet-130.inc";
}

subnet 192.168.135.0 netmask 255.255.255.0 {
  option domain-name-servers 192.168.135.1,192.168.130.10,192.168.130.12;
  option routers 192.168.135.1;
}

subnet 192.168.136.0 netmask 255.255.255.0 {
  option routers 192.168.136.1;
  include "/etc/dhcp/homenet-136.inc";
}

group homenet {
  host a {
    fixed-address 192.168.130.x
    ...
  }
  host b {
    fixed-address 192.168.130.x
    ...
  }
}

group otherstuff {
  host c {
    fixed-address 192.168.135.x
  }
  host d {
    fixed-address 192.168.135.x
  }
}

Instead of putting a “host” inside of the “subnet” scope, I put them all at the top level. Apparently dhcpd just “knows” that a host belongs to a subnet based upon the fixed-address matching the subnet+mask given in the subnet declaration, instead of trying to use the “subnet” scope to organize “host” entries.

After doing this, factoring out some duplicate classes to set the filename, UEFI PXE booting my test system worked on the first try! I’m past the point of caring now to bisect my original config, this was a detour from doing other things. Besides, I should be burning all of this down and finishing my migration to the Kea DHCP server.

Supermicro / Intel Atom 2550 fix

I’ve loved using the A1SRi motherboards in ikeacluster for years, they offered a lot of RAM power with a fan-less embedded CPU. I’ve ran them for several years and had a couple eventually succumb to the Atom C2000 clock failure problems. The BMC would work but the motherboard would just decide one day to stop booting.

For years I wrote these off as dead and what I thought as beyond the period at which people were getting Supermicro to replace them. I was sad there was not a great replacement. I seem to recall the Denverton mini-ITX boards were either late or were considerably more expensive. There were some other A1SRi boards on ebay, but they were expensive and who knows when they’d die.

A while back I had read on some forums where people were working on a fix to revive these motherboards. I finally got around to reading up on the Serve The Home thread and a Truenas thread to try it out. On my boards I ran a 200 ohm jumper between pins 1 and 9 on the TPM header and that seemed to do the trick:

TPM header jumper

It still has the problem where the BMC is still alive and I can control VGA+keyboard input, but Linux ipmitool can’t query the BMC nor does serial redirection work. But I’m happy I was able to revive two of my motherboards! It also looks like even as late as 2023 people were getting these boards replaced by Supermicro, so I might have to give that a try.

Update 27-Oct-2024:

huh, I thought I had read this fix breaks connectivity between the OS and the BMC, but serial-over-LAN and things like ipmitool lan print work:

ipmitool sol activate after resistor fix

ipmitool viewing BMC sensors and network settings

I’ll take it!

The whole Let’s Encrypt thing has the side effect of making me cranky every few months as I go around checking what expired, what automatically renewed, and what needs more babysitting.

Today I want to bitch about the Ubiquiti UniFi controller software. As far as I can tell even the mere CONCEPT of updating TLS certificates STILL does not exist anywhere in the controller or the support documentation. Sure they make a nice web UI to manage your 11ty-dozen wireless APs, cameras, doorbells, LED panels, key readers, and whatever thing they’re pushing this month, but keeping the web UI secure and up to date in a post-Snowden world? Nah, screw you. Not even a clumsy annoying web way to do it, no “click here to re-generate a self-signed certificate”, not even a sanctioned command line way to do it. You’re utterly on your own to figure it out. I guess this is one carrot of forcing people to use their cloudy UI.com service.

This has lead to countless people like me reinventing the wheel since 2016 and poking at the Java keystore directly with the old ACE.jar and keytool tools. You did naturally assume it’s a Java keystore the first time you encountered self-signed or expired certs warnings, right?

It’s even worse now when you layer all the Let’s Encrypt tools on top of it, because virtually all of them assume you’re on some form of Ubuntu or Linux. You won’t know it until you try to run a deploy script or read the code. I’m running it on MacOS which is a sanctioned platform and gets regular releases. The official acme.sh/deploy/unifi.sh claims it supports self-hosted, but it really assumes self-hosted on Linux. I’m afraid to know what the Windows people have to deal with.

What I wound up doing is using and tweaking the unifi_ssl_import.sh script from https://github.com/stevejenkins/unifi-linux-utils. This takes care of exporting a PKCS12 file and importing it into the Java keystore. It assumes Certbot and Linux, but it easily adapted to Acme.sh paths on MacOS. Thank god this isn’t some gigantic monolith of bash and it is fairly straightforward. I run only this script and it takes care of updating the UniFi keystore.

It is not automatic upon renewal, and doesn’t automatically restart the Unifi software. Those are problems for another day, maybe in 100 more days.

-UNIFI_HOSTNAME=hostname.example.com
+UNIFI_HOSTNAME=${HOSTNAME}

# Add this to override all of the Fedora/CentOS/Ubuntu/CloudKey paths
#
+# MacOS paths
+UNIFI_DIR="${HOME}/Library/Application Support/UniFi"
+JAVA_DIR="${UNIFI_DIR}"
+KEYSTORE="${UNIFI_DIR}/data/keystore"

# Script assumes Certbot paths, tweak for acme.sh
+# MacOS, this time for acme.sh
+ACMEBASE="${HOME}/.acme.sh/${UNIFI_HOSTNAME}"
+PRIV_KEY="${ACMEBASE?}/${UNIFI_HOSTNAME}.key"
+SIGNED_CRT="${ACMEBASE?}/${UNIFI_HOSTNAME}.cer"
+CHAIN_FILE="${ACMEBASE?}/ca.cer"

# Add -legacy option to openssl in two spots
+    openssl pkcs12 -export -legacy\

Maybe someday I’ll get around to sending in a PR to add MacOS support for the deploy script, but not today. I’ve already spent too much time shaving this yak and have other things to do.

Unhelpful responses from the peanut gallery on this issue:

  • Just type in “thisisunsafe” every time in Chrome! fucking hell, this isn’t even attempting to solve the problem. would you tell your director or CISO to do this?
  • Just proxy it behind Apache/Nginx/Linux!  no. now I have to support and configure two things.
  • Just run it on Linux! Bro, I swear a raspberry pi is all you need, bro please! no. see above, now I have to support an entirely different piece of hardware and OS.
  • Just don’t run the web UI! bro, their entire product revolves around running a web UI, how do YOU run it?

Or you know, Ubiquiti could actually provide a mechanism for uploading a new certificate+key pair.

TL;DR: Controller said RAID1 was lost after disks being powered on for first time after 20 years, I didn’t believe it. Booted into Linux and dd’d the last good disk. Recovered the UFS filesystem, I have 20 year old artifacts to sift through.  Always take images of your drives before mucking with them.

The main database server / admin server for my old ISP was a Dell PowerEdge 1550 1U server running Solaris 8 x86, on three 36 GB Seagate Cheetah SCSI U160 hard drives. It was shut down in 2004 when I folded the company, but I hung on to the drives in case I needed the records for disputes or something, and repurposed the server as a colocated shell server. I almost took the system to e-waste a few months ago when I was purging a bunch of other old rackmount servers from my storage unit, but decided to hang on to it for whatever sentimental reason a little longer.

Recently I was digging through old files to find old ISP setup notes. I found what I needed on my laptop, but it made me remember I still had the ISP drives and I should see if I had any more vintage notes and squirrel away an image of the OS so I could finally ditch the hardware. I had no intention of ever firing this stuff up again and considered it a forgotten memory. The old hard drives have been in my drive collection in the bedroom, so that’s about as good as storage as they get.

In search of RAID

During the time at the ISP the server was using a Dell/Adaptec PERC hardware RAID controller, so I’d need that to revive the data. I took the controller out when I switched to Linux with software RAID using the on-board Adaptec AIC-7899 SCSI controller, and I have no idea what I did with it. I probably e-wasted it a long time ago. So first thing I needed to do was find out what kind of PERC card it had and go find one on eBay. My system was so old I couldn’t even look up the service tag on Dell’s website anymore. The PowerEdge 1550 has been lost to time, there’s very few photos of it online, and none that I found with a PERC installed to reference. I guessed from some service notes and went with a Dell 493 PERC 3/DC card, which sounded vaguely familiar and was around the right vintage.

I made sure the system could actually power on and put in a set of Linux disks from the colo days. Other than a dead CMOS battery, the system eventually booted into Linux as a test just fine. I have no idea why but it takes several minutes for POST to run and load the Adaptec 7899 BIOS, I don’t remember it being this achingly slow.

Next it came time to try the Solaris hard drives. I had no idea what RAID configuration I used, I kind of assumed I probably did a RAID 5. No idea of the order of the drives. I wasn’t even sure which version of Solaris was on there. I first powered up the system without the drives, went into the PERC firmware and reset all the logical device configuration to defaults. I popped in the Solaris drives and right away on boot the PERC BIOS spun up two drives.

Going into the PERC BIOS again, it had imported a RAID1 configuration from the drives. Two drives were in a logical group, one marked ONLINE and one marked FAIL. The third drive was marked as HOT SPARE. That was a promising start!

A brief glimmer of hope after 20 years

I didn’t put a lot of care into trying to recover this, it was more of a nice-to-have. #YOLO. I let the system boot, told the PERC to proceed with the degraded logical volume group. Up pops the blue Solaris Boot Subsystem screen! Right at this same time the PERC alarm starts SCREECHING because of the failed drive and it was LOUD. I had forgotten all about this and there were no buttons or anything anywhere to silence it. There’s no way I could work on this thing in an apartment with that going off.

I hit the power button to turn off the system, turned it back on and went back into the PERC menu to silence the alarm. Except now in the PERC BIOS all drives were marked FAILED! wtf!

 

Artists re-enactment of RAID failure

I wasn’t completely convinced the drives died all of a sudden after one power-off and thought it was more likely there was some sort of bad state stored in the RAID configuration from the power-off. I fiddled with it for a while, trying to remove the config from the card and re-importing it, moving drives around in drive slots, and it kept coming back as FAILED. One of the disks had to still be working to read the RAID config I thought. I also didn’t know the numbering of the drive slots, so I wasn’t sure which two were the data drives and which was the hot spare anymore. Did I mix the old hot spare into an order it expected to find a RAID member? Did one RAID member just die?

So I put it all aside for a few weeks to ponder.

What to do

If it was a RAID1 I thought in theory both drives should have a usable set of data outside the RAID metadata, provided they were still mechanically functional. Even if the sync was broke and one had a slightly older set of writes, this was fine for this archeology dig. The question was if the RAID metadata would throw off any tools to poke at the filesystem. Message board posts all suggested if anything hooking the drives up to a non-RAID SCSI controller to take the hardware RAID out of the picture and taking images of the drive if they showed up, that way they could be experimented on with recovery tools. This was slightly more complicated in that the Solaris 8 filesystem is the older UFS, not ZFS or EXT3/4. Several commercial packages promised they could recover UFS for a modest three digit sum.

I decided on hooking the Dell drive backplane directly to the onboard Adaptec SCSI controller and booting Linux. If the drives showed up I could at least dd a copy of them to fiddle with later and would have more tools to poke at the SCSI bus.

Getting Linux over was going to be work, the system didn’t support booting from USB. It had an IDE CD-ROM drive, a 3.5″ floppy drive, and could network PXE boot. While I have a functioning PXE environment and actually PXE installed CentOS on this system when I had it in colo, I long since removed my old CentOS 5 files. Rigging up a PXE bootable Live ISO image just for this sounded like a lot of work. Ubuntu 14 server was the latest i386 version I could find that still fit on a CDR disc. Miraculously I still had five blanks laying around. The only CD burner I owned was in my Windows 95 machine, so instead of shelling out money on Amazon for another external burner, I went to a lot of effort to just burn it using the 486 (at 2x!).

Of course when it came time to boot, the CD drive in the Dell was not working anymore. I wound up throwing together enough PXE glue anyways to boot the CentOS 6.10 i386 installer in rescue mode. This kernel should well be new enough to have all the 2000-era Adaptec drivers built-in.

Struck data!

One by one I tried all three hard drives. The first one oddly showed part of a serial number to the Adaptec BIOS, but otherwise was undetected by Linux. The second drive showed up! An fdisk -l detected two partitions, “Solaris boot” and “Linux swap / Solaris” !!!

I popped in a USB stick which at least showed up as a mass storage device to Linux and I began a dd of the hard disk to it. About 15 minutes later I checked progress on another vty and quickly realized it had only copied a few dozen megabytes and this was probably using USB 1.1 or maybe 2.0 and it was going to take all night to copy this drive. Would the hard disk survive this long? I threw together a dd | ssh command and let it copy a couple of images across the network to another system. It’s a Pentium III 933 MHz system, so not a complete slouch.

Eventually after a couple of hours the dd over the network succeeded without any sort of errors, so I had at least one copy of whatever was on that disk. I have no idea if that was a working member of the RAID1, or if once upon a time it was part of the RAID1 and I demoted it to hot spare without wiping it, or what. The 3rd disk was completely dead, it didn’t show up on the Adaptec at all. So it seems I did lose one disk during my initial power-off.

After I was satisfied I got a good as copy possible, I let the good disk boot in the system by itself to see what would happen. The blue Solaris bootloader screen loaded, then dropped into the configuration assistant. It didn’t seem to find a kernel on disk to boot, but otherwise the disk acted fine.

Over on another Linux system I ran “strings” on the 36 GB image I captured and it clearly had some viable data in it. I saw a bunch of email, sendmail config, html, mysql commands, and other stuff I recognized. Now the question was how to mount this sucker under Linux. I did some reading and Linux does have UFS support, including Sun x86. I learned that Solaris slices are different than typical Linux partitions in that they’re more a set of logical extended partitions within a standard partition. The Linux kernel with the UFS module loaded understands this and as I saw with the Solaris drive inserted over on the Dell, it will enumerate all the possible slices as extra disk partitions, e.g. sda1 sda2 sda3 sda4 sda5 ... sda15 even if tools like Linux fdisk and parted only see a boot and data partition.

Linux recognizing Solaris disk slices

Here’s what fdisk looked like when reading the captured dd image itself:

root@basic06:~# fdisk -l ./image-sda2
Disk sda2: 33.9 GiB, 36328801280 bytes, 70954690 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x69747261

Device Boot Start End Sectors Size Id Type
sda2p1 1851867950 2396369563 544501614 259.7G 6f unknown
sda2p2 1397314113 3266884704 1869570592 891.5G 20 unknown
sda2p3 0 0 0 0B 6f unknown
sda2p4 20480 20480 0 0B 0 Empty

Partition table entries are not in disk order.

Trying to mount UFS from Linux

(See 9/26 update where a newer kernel fixed all this) I tried a variety of ways trying to mount the UFS filesystem on Linux with no luck. Neither “mount -t ufs -oro,ufstype=sunx86” on an extended device id for a slice such as /dev/sda10 worked, nor on the raw image file of just the 2nd Solaris data partition nor image of the entire disk. I tried some examples of calculating offsets to mounting specific slices or possibly avoid any RAID metadata and those didn’t work. I got a variety of wrong fs type, bad option, bad superblock, or ufs: ufs_fill_super(): bad magic number errors with these attempts. losetup and friends didn’t seem to work for me either, which to be fair I’ve never used.

Another idea I had was to copy the image to a USB stick on another system and letting the kernel detect it as a drive again. Trying to mount it this way didn’t work while I was booted into CentOS 6, I thought maybe a newer kernel would help. I let it copy to USB while I went on to try the next thing, installing Solaris. (I wound up not using this)

Installing a Solaris 8 VM

I gave up and installed Solaris 8 Intel in a VirtualBox VM to see if I could mount the image there.. It’s been yeaaaaars since I’ve touched Solaris, much less v8, but I got something working. I had to convert the dd image to a .VDI image so VirtualBox could actually present it as a drive to the VM. (“VBoxManage convertdd image1-sda image1-sda.vdi --format VDI“).

Within Solaris I had to run devfsadm after boot to get it to recognize this as another IDE drive. It showed up as /dev/dsk/c0d1, and “format” listed a bunch of slices when it was mounted!

Finally, success!

At long last I was finally able to mount the individual slices! and there was intact filesystems with my files!

Browsing around it looked familiar, all bits and pieces of a working system. It looks like this stuff is somehow from about 2003, so this may be leftover from a drive swap, I don’t know.

I also forgot Solaris doesn’t have anything like ssh or rsync out of the box, or I forgot where to install it. So I’m going old-school and running a “tar | rsh” to another system to sift through it more.

I am curious to go looking for the hardware RAID metadata on this disk, is it at the beginning, the end? What does it look like?

Update 9/26:

Fiddling with the whole disk image on a CentOS 7 system with a 5.3.5 kernel, I have success mounting the UFS filesystem, whereas this was failing over on Ubuntu 18 with a 4.15 kernel:

# Mounting with a loop device
[root@basic03 ~]# losetup --partscan --find --show ./staff1-9pf-sda
/dev/loop0

[root@basic03 ~]# dmesg -T
[Thu Sep 26 23:46:44 2024] loop: module loaded
[Thu Sep 26 23:46:51 2024]  loop0: p1 p2
  p2: <solaris: [s0] p5 [s1] p6 [s2] p7 [s3] p8 [s4] p9 [s5] p10 [s6] p11 [s8] p12 >

[root@basic03 ~]# fdisk -l /dev/loop0
Disk /dev/loop0: 36.4 GB, 36420075520 bytes, 71132960 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x00000000

      Device Boot      Start         End      Blocks   Id  System
/dev/loop0p1   *       16065       48194       16065   be  Solaris boot
/dev/loop0p2           52610    71007299    35477345   82  Linux swap / Solaris
[root@basic03 ~]#

[root@basic03 mnt]# mkdir s0 s1 s2 s3 s4 s5 s6 s7 s8 s9

# Mounted Solaris slice 0 containing / using the linux /dev/loop0p5 partition
[root@basic03 ~]# mount -oro,ufstype=sunx86 /dev/loop0p5 /mnt/s0

[root@basic03 ~]# mount | grep mnt
/dev/loop0p5 on /mnt/s0 type ufs (ro,relatime,ufstype=sunx86,onerror=lock)

# Solaris / directory!
[root@basic03 ~]# ls -l /mnt/s0
total 39
lrwxrwxrwx  1 root root     9 Sep 24  2001 bin -> ./usr/bin
drwxr-xr-x  2 root root   512 Sep 24  2001 boot
drwxr-xr-x  3 root 60001  512 Sep 24  2001 cdrom
drwxr-xr-x 12 root sys   3584 Sep 27  2001 dev
drwxr-xr-x  6 root sys    512 Sep 24  2001 devices
drwxr-xr-x 30 root sys   3584 Jun  8  2003 etc
drwxr-xr-x  3 root root   512 Sep 24  2001 export
...

 

Improvised drive sled

[photos: flickr – Macintosh Quara 700 drive sled]

The Quadra 700 I acquired had the internal plastic assembly that held the floppy drive and hard drives, but didn’t have the sled that the hard drive went in and clipped into the system. These are hard to find on top of an already hard to find system, lore seems to be when recyclers yank the hard drives, they discard the sleds. The Quadra 700, IIcx, and IIci all use the same sled, model numbers starting with 805-5078 or 815-5078. I checked several component places and eBay, and nobody had any for sale. You can still put the hard drive in the Quadra, there’s just nothing holding it in place preventing it from flopping around.

Fortunately lots of photos of the thing exists and it’s just a U-shaped piece of sheet metal with some holes and tabs stamped in it. It seemed easy enough to just go make one. I broke out the ruler and caliper and made some measurements of my own system. Later I discovered this post on 68kmla by Phipli who had a drawing of the drive sled which gave me the outside dimensions and let me fine tune my own measurements a bit more. Then I discovered the 3D printed version by branchus on Thingiverse (I love his Mac repair streams). I don’t have a 3D printer, and didn’t feel like going out to learn Fusion360, how to use a 3D printer, and tracking down one of our libraries just for this when I already have the metal and a metal brake. A local 3D service quoted me $38 to print one, which felt steep.

Nibbling the holes

I started with a piece of 18 gauge aluminum 85mm x 196mm. I didn’t yet know how much I needed to compensate my measurements for the bending so I started working from the inside going out. I made sure the inner dimension was at least 103mm wide to allow a 3.5″ drive to slip in. First I nibbled out holes for the raised square bit at the bottom and the sides.

I bent a small scrap piece to figure out the bend would eat about 1.5mm, and used it to find the position and dimensions of the vertical holes where tabs would lock in. I finished marking all of these on the metal. If you can read my scribbles, that’s all of the dimensions give or take a mm.

My Dremel was utterly dead so I took shortcuts with the rest of the cutting. The OEM part had a D-shaped cut over the side humps presumably to let part of the side remain straight upright while the rest flexed, I omitted this completely. I thought a 1/2″ cold chisel would be perfect for knocking out the side tabs that would lock into the plastic assembly in the case. After a few whacks with a hammer I didn’t punch through the aluminum like I had hoped for, so I opted for the jankiest part of this whole thing by hammering a screwdriver through it!

Being punched through actually worked pretty well at giving me protruding tabs on the outside surface, smoothing a bit here and there with pliers to get it just right. Next up I used my metal brake to fold the sides up. I didn’t trim the tops of the vertical pieces like the original to provide finger tabs, it seemed to fit fine without them.

The thing actually fit into the system almost exactly the first try. I had to do a bit more nibbling on the square hole on the middle and square off my bend and it fit nicely. I messed up drilling the screw holes, so they aren’t pretty, but they work.

All in all, the thing works. I can pinch the sides to take the drive in and out, and it locks the hard drive in place. I’d say pretty good for a Saturday afternoon of tinkering around without the right set of tools. Now I can continue lurking eBay sales hoping for an original sled, or get around to having one 3D printed someday. If I had a OEM sled I would be temped to get better measurements and send off somewhere to laser/plasma cut a few dozen sleds to hand out but eh I don’t want to be in the shipping biz.

Final product

Older Posts »