Feed on
Posts
Comments

A collection of stories of evolution, follies, starting from scratch, and what not to do.

When I started my ISP back in the summer of 1996, how to bill customers and collect money from them was an afterthought. I started with nothing, not even a database, and not realizing what a vital chore it would become. There were some ISP billing programs out there, but they were often very expensive, rigid, or expected you to be a Windows NT shop. Compared to providing customer technical support, the act of actually generating invoices and processing checks was equally as consuming of time. If I had a better story around payments the business may have been more profitable and lasted longer. It gives me a lot of appreciation for the problem of billing and payments, there’s a lot of considerations that have to go into it.

There were a few decisions early on that set the course of things to come:

Metered billing

I was one of the few ISPs that did metered billing instead of “unlimited” access. You subscribed to a base tier and for $/month you got so many hours online per month, and then paid so many cents/minute of overage after that. (E.g. 25 hours for $15.95/mo or 240 hours/mo for $25, 2 cents/minute overage.)  Unlimited access was the norm, where you paid a flat rate of $19-$30/month and use it however you wanted. The idea that somebody could leave their computer on all day sitting there hitting the POP3 email server while not at home, occupying my limited phone lines and modems, was completely ludicrous to me. To me selling “unlimited” service felt shady and a legal liability because there were always buried terms and service to harass or fire the customer if they were online “too much” after just telling them “use it as much as you want”.

This model lead to two immediate problems, how to do recurring monthly billing along with usages, and how to process call accounting? The usage model ruled out using something like Quicken to send out simple invoices. A few ISPs I knew repurposed old pager billing software, as it had the notion of recurring billing and minute rates. I forget what I did in the very beginning, there was probably some perl script I got somewhere that processed RADIUS accounting and generated daily reports.

Checks, invoices, postal mail

Being a rural area and 1996, the standard way people paid was with a check or money order. You mail them a invoice, they mail you a check. Not many people had credit cards, or if they did they were very hesitant to use them for a monthly service. There was a fair amount of money orders coming in, people would take cash to the US Post Office, buy a MO and send it to us. The office door also had a mail slot, so people could also drop off cash or a check there too.

Billing date

Another big decision I had to make was “when” to bill the customer. Most companies billed at the first of the month. If you signed up for service in the middle of the month then they would generally pro-rate the bill and add it to next month’s bill. I realized right off the bat this meant doing a huge batch of work at the end of every month, and all the money would only come in during the middle of the month. It seemed like it would be better to spread out the work throughout the month and ensure money was constantly coming in. Plus I didn’t want to do this pro-rating every time somebody signed up. So, anniversary date billing it was. If you signed up on the 18th, then your service was due for payment on the 18th of every month, and I sent you an invoice two weeks ahead of that on the 4th.

But what happened if you signed up on the 29th-31st of the month? I just set your anniversary to the 1st of the next month.

I also billed ahead for service, because I needed the money now to operate. None of this business you use it for a month, I bill you afterwards for the month on net 30. I didn’t like the idea of people signing up, using the service all month, and never paying a cent. When you signed up your first month was due immediately. It also made service suspension tidy, if you didn’t pay your bill by the start of the new period (with a very brief grace period), your account didn’t work.

In reality this made for a lot of awkward conversations with customers, “yes you see you pay for next month’s service, but we’re also billing you for any overages from the last full month you had”. Some people had problems with my service dates, for instance going from Mar-4 to Apr-4 instead of Mar-4 to Apr-3. I don’t know if it was right but seems to have worked out.

Anniversary date: 18th
Invoice generated + mailed out   3/4
Calculated overages through      1/18-2/18  < Billed for January-Feburary's overage
Due date, anniv date             3/18       < Billed for March-April's service

Excel as a billing system

Customer ledger

From the very beginning, I used Excel and a lot of copy-paste to send out bills. A worksheet would contain a generic form of the invoice to be mailed out, and then another worksheet contained all of the names and addresses of customers, previous balances and payments, arranged by billing date. Every day I would go through the sheet, basically do a hand mail merge of every customer copying their address from one sheet into the other, look up any usage overage, and print out their invoice. This took at an hour or two a day, and often dad would begrudgingly volunteer to do it for me in the evenings (thanks dad!). I forget how usage information got on there, if it was manually running a perl script on the database server to key in by hand, or if I somehow made a .CSV to load into Excel every day to copy from.

I actually found a floppy disk the other day that had a copy of this 16,000+ row .XLS file on it which prompted me to write this post. Excel usage lasted well into 1998, much longer than I thought it did!

Page to be printed and mailed

 

The first billing software I bought dedicated for the purpose was called BATS (Billing And Tracking System). This was a Unix-based program that was supposed to do it all, billing, RADIUS parsing, overage tracking, accounts receivable, the works. The problem was by the time I got all of my customer information typed into it, we had already blown past the user license and couldn’t add anymore users. The next license level was a huge jump from 75 to 5,000 users, and I couldn’t afford the upgrade. This was back when $1k was a lot of money, as I was still burning cash every month and not remotely profitable. I never did bill any customers with it, I seem to recall it had a lot of nits I didn’t like and/or I couldn’t get it integrated with my setup so I never scraped together the cash to upgrade.

I famously predicted that maybe after a few years I might have a couple hundred users on the service, but this thing took off like a rocket and I had over 100 customers within months. Blowing out the license count and doing things by hand in Excel wasn’t going to work at all. I didn’t what to do, but something had to be done.

Enter perl

I had never done any serious programming at this point, and had never used an SQL database. I dabbled in C but didn’t get it. I knew perl was popular with system administration so I gave it a try. One of the first big problems was processing the customer dial-up usage. I needed to not only put this into a database to do billing but also a way for customers to look up their own usage so they didn’t blow up their bills.

I literally went from print("hello world"); $a = 100; print("a is $a"); into learning to parse RADIUS accounting logs and INSERTing them into a MySQL database. (Back when mSQL vs MySQL was the MySQL vs Postgres of its time.) This made it way easier to look up somebody’s usage. I had a perl script that would parse the accounting logs every day and toss them into a table. Later on FreeRADIUS would do this.

This taught me about database indexing. My schema had no keys or indexes at all, so over time it got slower and slower to do lookups. One day I learned I could do an ALTER TABLE..ADD INDEX (username,time) to set the username and call time as indexes, and holy shit it was mind blowing how fast lookups got.

As I got more familiar with Perl and MySQL I got to the point where I was able to write a very rudimentary billing system and stop using Excel. There were a few tables with the customers’ names and addresses, their account type, and a ledger to keep track of their balance. Every morning billing.pl would run, look for who’s billing anniversary date was coming up, roll the service dates in the database out a month, calculate last month’s overage usage, write out a text invoice to a file, and then shoot that file to the printer with lp. Oddly doing date calculations and re-formatting date fields became the bane of my existence and had to go through a few Perl modules to find something that worked well.

For quite a while this system assumed a customer had one and only one dial-up account, everything was keyed on their username. I couldn’t bill for anything else without creating a new username. It felt like quite a bit of achievement to go back and refactor my code to basically add in another while loop to handle multiple services, but it all worked nicely and gave a lot of flexibility.

Part of this big change required assigning everyone an account number, then that account number would own all the usernames associated with it. I made a fatal flaw here and used the username’s unix UID number along with the account creation date to ensure uniqueness, in the form “uuuu-mmddyyyy“. The major problem was that this scheme leaked the number of subscribers I had, so one could extrapolate how fast I was growing over time if they knew several account numbers. I don’t know if anyone did that, but it was sure there in the open. At 13 characters it was also much longer than it needed to be, 5 digits would have been plenty to identify an account. I really wish I had hashed it to a shorter sequence of letters and numbers, as it was a lot to say over the phone and type in all the time. Toward the end the UID portion was also a sad reminder of how many thousands of dial-up accounts we had created over time and how many were left active.

There were still many invoicing edge cases to deal with by hand. For example if somebody upgraded or downgraded their account this involved manual SQL queries and/or manually editing a text file to send them a new invoice. There were often grandfathered accounts when I’d change around services and had to keep the old rates in code. Or if there was a bug, printer problem, missed day, it involved going into the script and manually setting some variables and running it again, praying it didn’t just nerf everyone’s service dates. I didn’t know about SQL transactions, and MySQL didn’t even have them yet!

Bills had a invoice number and a sequence letter starting with ‘A’ on them. If your sequence was ‘B’ or even ‘C’ that means some of the day’s invoices were manually re-printed due to a problem and to ensure we sent the right ones out.

De-coupling database operations from the print operations was a first big step to unravel the mess. One perl script would go through and roll forward all the service dates, generate pending ledgers, then other scripts would come along to generate invoices and run credit card payments. Several more rewrites would follow, billing.pl became billing2.pl, billing3.pl, get-bills.pl, cc_proc.pl, print-inv2.pl, and god knows what else. I learned to embrace use strict; in perl to prevent a lot of bugs.

There was no grand design here, there were no libraries, barely just a set of included common files. I didn’t have any experience with how billing “should” be done until I worked at other places. This was all learning on the go and doing what was necessary at the time. I copy pasta’d the same boilerplate Perl DBH to MySQL connection code in practically every single script because it was easier than refactoring it all. Source code management was making a tarball of the scripts and MySQL databases.

Perl-based invoice generation

Enter PHP

Around this same time I was starting to learn PHP too out of necessity. Coming off my experience with MySQL I was starting to use PHP to make it easier and possible for other employees to deal with customer data. There were some basic internal web forms backed by Perl CGI scripts to enter new signups, change addresses, enter payments, and see payment history, these were mostly re-written in PHP so the code was in one file per feature.

This became a whole dual codebase beast. Anything on the backend was written in perl and did the bulk of database manipulation, invoicing, and reports. All customer or employee-facing stuff was written in PHP. There was frequent re-implementation of functions in perl and PHP, and the code was littered with global variables. I briefly tried using Perl-Mason (embedded Perl in HTML), but PHP was just faster to write and ran faster.

The internal PHP tools website grew more and more features for employees to interact with. I implemented a rudimentary “issue tracker” which was just a glorified text form for each customer, to keep track of account notes and technical support issues. I had embarked on a complete v2.0 rewrite of all of this that I never finished, which wound up with us using some legacy PHP pages for some work and the new PHP pages for some other work. Like perl these were largely all one PHP/HTML page per function, e.g. add a user, modify a payment, update a credit card, customer search pages, add/cancel services, bad check processing, reports for collections, etc.

I still have all the code from this today and couldn’t tell you of the half dozen half-rewritten directories of code were the latest working versions.

Credit cards

I accepted credit card payments pretty early on, using the payment processor one of my upstream ISPs used. Again, a very manual process at first. When customers signed up at the bottom of the form was an area where they could write in their credit card number for automatic billing. The processing company provided a DOS-based settlement program that was a glorified handheld terminal. You keyed in the credit card number, the address, amount and hit send. This would dial out on a modem to submit the charge.

In the beginning there was a field on the Excel spreadsheet or database to note that this invoice was to be billed with credit card, not to mail it out. So every day or so I would take the stack that was set aside and key in the few credit card numbers and submit them.

Eventually I got to the point where I was storing credit card numbers in MySQL (this was long before PCI DSS and I could do this). I could then write CSV files with payment information, and then submit that batch over modem to the processor for settlement. If the credit card was declined, I wrote a note on the paper invoice and mailed it to you.

Way later we signed up with Authorize.net to do real-time and batch card settlements over the Internet. I recall something really bad happened with them, like they were late to deposit funds or we had a contract dispute, and this resulted in going to yet another payment service.

Checks, everyone hates checks

Checks were such a pain in the ass, I wish I was able to stop taking them or seriously discourage their use. They cost a lot of time and money to do. Not only did this involve printing invoices and mailing it to customers, it involved receiving them, keying the payments into the system and taking them to the bank. Because I was sending out bills every single day, this meant we received payments every single day and they had to be deposited. We were doing well over a thousand check deposits a month, and because of the janky date adjustment I did for people who signed up between the 28th-31st, this meant at the beginning of the month there would be a huge pile of checks to go through. I’m not joking, there would be literally a stack of checks 1.5″ thick to process at the beginning of every month.

Our bank did not like this one bit! It got so bad they warned me they were spending so much time on our big deposits they couldn’t close their books at the end of the day — either help them out or take our business elsewhere. It was a small town bank, they had no way to take any of this electronically or do ACH.

By now I already had an internal system for bulk check entry. There was a PHP web form where we could enter in a list of daily check payments to apply in bulk without moving off the keyboard or going to other screens (unless somebody didn’t have their account number). Based on this I was able to print a form with check info and total in chunks to send along to the bank with the deposit so they could go down the list and just check that we were right instead of doing the work themselves. Bank was somewhat happier.

Later I started dealing with a new business bank in Tulsa who offered electronic check deposits, but they did not have any kind of API. It turned out all they had was a web form to enter in the routing and account numbers and amount. I thought maybe I could scrape their deposit web page and do a POST request, but their HTML form wound up being a mess of input fields and I gave up.

I forget if I ever did get to a place where we could submit checks electronically or if we were still taking them to the bank up until the very end.

Paper billing

Every day at 11 AM (even weekends) the billing cron job would fire and invoices would start rolling off the printer. It would be somebody’s job to take the pile of paper, fold it, and stuff into envelopes. We tried a few different paper folding machines and they worked until they didn’t, mangling a lot of invoices which needed re-printing. I got to where I would just take 10-20 sheets at a time, fold them by running a metal bar across them to give a rough shape, then start stuffing them into windowed envelopes. For sealing them I’d hold a tape dispenser in one hand and just dab a 1″ strip of tape across the back flap of each one. For postage we had a postage meter for a while, but Pitney Bowes wanted a lot for the rental and it was annoying to constantly replenish the account. So I wound up buying coils of stamps and either using a stamper or going through the stack and putting them on hand. It could all be carefully organized and done in a mechanical manner by hand, but it had to be done every single damn day.

The post office didn’t like us either, with our volume they demanded we sort our outgoing mail into local and non-local ZIP codes to make their sort easier.

Late payments were always a problem as well as associating the payment with the right account. Nobody would write their usernames or account numbers on their checks, or somebody would write a personal check for a business account, names changed, etc. To help encourage people to send in their payment in a timely manner and figure out who is who, I started including a pre-printed return envelope, and going as far as to buying paper with a perforated bottom for a stub they could tear off and return. This worked pretty well, virtually everyone used them but it was yet an extra expense.

Once we were sitting around thinking “you know we haven’t done bills in a while, did you do them?” “No, not me”. I realized after a system migration the billing cronjob had been commented out and no bills had been generated for at least week! This took a bunch of careful editing of the billing script to step through each missing day and process that day’s invoices to get us caught up. To this day I still have dreams where I freak out that I haven’t done billing in a while.

If this all sounds labor intensive, crazy, and expensive, that’s because it was! It’s hard to say if we pushed back on customers for a credit card or charged a $1 a month for a paper invoice, how many would bail on us. Always a monumental “if” was if we went card/ACH only, could we reduce costs enough to lower prices to attract more customers to offset the ones that we would inevitably lose from the switch.

Homegrown Postscript invoices

Several years in (~2002) I realized Postscript was just text and if I just shot Postscript at the printer, it’d print. This lead me to switch from writing plain text invoices to using Postscript. This provided a much nicer looking bill, I could print logos and barcodes for the account numbers on them. The process was similar to writing text invoices and surprisingly simpler than I imagined. I used the PostScript::TextBlock and PostScript::Elements modules to generate strings to print on the page “canvas”. For example:

my($p_head1) = new PostScript::TextBlock;

$p_head1->addText(  text => "CWIS Internet Services\n", font => 'Helvetica-Bold', size => 10, leading => 10 );

$p_head1->addText(  text => "203 North Broadway\nStigler, OK 74462",
                  font => 'Helvetica',
                  size => 8,
                  leading => 10
                );
...
print BILL "%!PS-Adobe-3.0\n";
print BILL "%%Pages: (atend)\n";
print BILL $code39;
print BILL "%%Page: $pages $pages\n";
$code .= [$p_head1->Write(252, 144, 396, 755)]->[0];
$code .= [$p_msg2->Write(108, 12, 410, 710)]->[0];
$code .= [$p_msg3->Write(108, 12, 370, 692)]->[0];
$code .= [$p_msg4->Write(108, 12, 374, 680)]->[0];
$code .= [$p_msg5->Write(108, 12, 389, 668)]->[0];
$code .= [$p_msg6->Write(108, 12, 421, 656)]->[0];
$code .= [$p_custaddr->Write(230, 60, 72, 702)]->[0];
$code .= [$p_acctnum->Write(72,12,452,692)]->[0];
$code .= [$p_invnum->Write(72,12,452,680)]->[0];
$code .= [$p_invdate->Write(72,12,452,668)]->[0];
$code .= [$p_pagenum->Write(72,12,452,656)]->[0];

print BILL $code;
...

For each invoice line item that was processed, call some more library functions to add a row to the output, and then more function calls to write a footer. It could even do multiple pages. The barcode was just another font, the company logo was included from an .EPS file, and this was all written to a file on disk. Then all these pages were sent to the printer.

Generated Postscript invoice

Because I kept the Postscript and text files around, this provided a nice feature where customers could log into their account management page and see all of the exact invoices that had been sent to them in PDF format.

I had hoped with the barcode this would make processing payments easier somehow, just scan in the account number. In reality by the time you reached over, scanned the barcode, and came back to key in the payment amount, it was just quick to use the 10-key number pad to enter the info.

FreeRADIUS, MySQL, Self-service

I adopted FreeRADIUS fairly early on because I needed to write RADIUS accounting logs to a database for billing, which is what it was designed to do. As time went on the set of PHP and Perl scripts to keep track of users made it possible to add/change/remove RADIUS authentication information in MySQL. This let me do automated account suspensions for non-payment, your dial-up account wouldn’t work but you could still receive e-mails. When the switch was flipped the past due accounts were ruled with an iron fist and people did not like this at all! At least I could blame it on “the system automatically doing it” and not something “we” personally did to you.

We adopted Exim and Courier-IMAP as the e-mail subsystem because it could be backed by MySQL. The same story, the billing scripts could manipulate accounts here, set up new ones, and disable them as needed.

Eventually toward the end this all resulted in a customer self-service system. Customers could log into their account management web page and do things like add extra e-mail accounts, change credit card information, or make credit card payments to re-activate accounts. There was a modem connection information page that would look up what USR Total Control or Portmaster 3 you were connected to, query it via SNMP, and display the speed/protocol/error counters of your connection for troubleshooting. It all was pretty nice but took a long time to get there.

Specials and $9,999.99 bills

Around the same time of the Postscript rewrite, I added several other features to the billing programs. If you were a credit card customer, we’d automatically email you when your card was about to be billed, if it was about to expire, or if it was declined. We could send you e-mail invoices if you really wanted them. If you bounced a check, we mailed you and suspended your account. If you added new services to your account, this was written to a pending table to add to your next bill.

I had never imagined I’d need to code in the ability to do discounts or special sales. The few times we did some new subscriber specials it was a mess, because the billing script had to be manually altered to handle it. I forget what special we did, pay a quarter and get a month free or something, but it resulted in a bug that sent out invoices for $9,999.99. I would have thought a normal person would see this and think “oh that’s clearly a mistake”, but no, several people very much called us up and yelled at us for expecting them to pay $10k! Even if I caught the bills before they went out and scratched out the amount with a pen, people were still upset about it!

I thank god that Internet service was not subject to sales tax or other taxes like it was in Texas. I can’t imagine what it would have been like to implement tax code.

Fortunately when we started selling DSL service no extra tweaking to the billing system needed to happen, we could just drop in new product codes for service and DSL modems and go. Around this time I finally gave in to competition and started offering “unmetered” dial-up plans along with lower cost metered plans.

My all-time favorite customer interaction was somebody that came into the office and very loudly started arguing with me that their bill was wrong, because September was not the 9th month. I had to list out the months on the whiteboard before they finally relented. Good times.

When it came time to close the business and another company to acquire all the accounts, this all blew up in my face because, for example, I didn’t have a good way to list who had pre-paid service and how many dollars were involved. It took me and somebody from the other side working well into the wee hours of the morning of the last day pouring over the database and building custom reports to make sure the numbers were right before we closed.

1994 computer prices

Digging through some more files from my old backup tapes, I found some of the price lists I was giving out when I was building and selling computers. This also gives some idea of version numbers that were going around that time. I didn’t carry any inventory at all, these would have been bought at somewhere like Sam’s Club or some big retailer in Computer Shopper. I seem to recall doing about a $20-$60 markup on most software.

Procomm Plus 2.01 for DOS   $85        Procomm Plus for Windows  $120
QmodemPro for Windows       $89.95     

Corel DRAW! 5.0 for Windows   $599     Adobe Photoshop 2.5 for Windows $525
Broderbund Printshop Deluxe 1.2 for Windows   $69

Lotus Smart Suite 2.1 for Windows         $475  (1-2-3, AmiPro, Approach, Freelance Graphics, Organizer)
Microsoft Office Standard for Windows     $470  (Word 3.0, Excel 5.0, PowerPoint 4.0, Mail)
Microsoft Office Professional for Windows $557  (standard + Access)

Artisoft LANtastic 6.0  1/5/1/25/100 user  $115/$399/$725/$1389/$2230
Novell Netware 3.12 5/10/50/250 user       $635/$1340/$1940/$2540/$3699
Novell NetWare 4.01 25/50 usr  $2899/$3799

OS/2 2.1 for Windows       $90        OS/2 2.1  $175   OS/2 2.1 Upgrade  $130
MS-DOS 6.22 Upgrade        $58        
Microsoft Windows 3.11    $125

Borland C++ 4.0  $350  Borland Turbo C++ 3.0  $88  Turbo Pascal 7.0 for DOS  $135
Microsoft MASM  "ask"
MS Visual Basic 3.0 Professional for Windows  $328, for DOS $335
MS Visual C++ 3.5 for Windows Standard  $99
MS Visual C++ for Windows NT            $385

A small office/home office (SOHO) system configuration looked like this:

486DX2-66, VESA Local Bus
8 MB RAM
15" 1280x1024 SVGA monitor
Diamond Speedstar Pro VLB 1 MB video card
1.44 MB 3.5" floppy drive
540 MB EIDE hard drive
2x speed EIDE CD-ROM drive
Full-tower case
EIDE VLB controller
US Robotics Sportster modem
MS-DOS 6.22 and Windows 3.11

for an easy $1989.95!

I was usually marking my PC builds up by $200 and still undercutting most retailers. However there was no way I could compete with software bundles found with most retail PCs, such as throwing in something like MS Office, Encarta, or other stuff, unless I blatantly bootlegged the software. Which, many small PC builders did. I discovered this to great effect when running the ISP and walking people through setting up dial-up networking and needing their Windows 95 floppies/CD. They didn’t get one with their computer which half the time left them in a boned state where DUN was half installed/half broken.

Personal PC

My personal computer which I ran my BBS on looked like this:

486DX-33
8 MB RAM
Windows 95
Maxtor 7245A  245 MB IDE hard drive, DrivesSpace compression to get 350 MB
Seagate ST351A/X  40 MB IDE hard drive, for Linux
Sony CDU-55E ATAPI CD-ROM drive
VLB EIDE controller
VLB Diamond Stealth 64 2 MB video card
Soundblaster 16 Multi-CD sound card
Colorado Jumbo 250 MB tape backup

I had a second machine that was a 286, 10 MHz on a LAN with Personal NetWare.

For Linux experimenting on my PC, my notes say I was running Slackware with kernel v1.1.59, with UMSDOS filesystem so Linux files and DOS could be on the same FAT filesystem. Looks like I was using a boot floppy disk to boot into Linux, later I used LOADLIN.

Modems were a 1428VQE, the world’s most generic external 28.8k modem, and a Zoom V.32bis


 

wcGate Satellite BAG support

I’ve been happily nerding out with vintage bulletin board system (BBS) software and UUCP to send/receive e-mail and newsgroups for a while. Something I kept seeing in BBS documentation was mention of the “.BAG format” or “UUCP BAG format” used by satellite providers when delivering Usenet feeds over satellite. I got curious about what exactly were these files and what did they contain? The term BAG or bag file never appeared in any of my UUCP books, Taylor UUCP, nor INN documentation. (Batching is mentioned, but not bags) The file type seemed oddly standardized for such a niche application and several BBS utilities claimed to understand .BAG files.

FNOS and InterGate with BAG format support

SAT 1.15 “Usenet BAG culling”


TL;DR: .BAG is simply an ASCII file containing a batch of newsgroup articles, as batched together for example by batcher(8) from INN, for easy delivery over UUCP. Also known as “rnews batching”. Instead of potentially thousands of individual files with one article each, it’s a sequence of larger file with a series of articles inside it. As the file exceeds a given size a new file is written with an incremental filename. It’s not explicitly for UUCP nor is it something that only satellite providers invented. It’s unclear if the bag term originated from satellite providers or DNews.


I’m familiar with satellite services such as Planet Connect and PageSat, which delivered a Usenet newsgroup feed, Fidonet, and other files over a over one-way satellite link. These operated at speeds between 19.2 – 128 kilobit/second. In terms of BBS usage, the receiver software would get these so-called .BAG files over the air throughout the day, write them to disk, and then some sort of tosser software would import them into the BBS message conferences. These services weren’t limited to BBSs — ISPs, businesses, and anyone else that wanted Usenet newsgroups could use them for a news feed as an alternative to fetching over their expensive Internet connections.

What I didn’t know was what was the format of these .BAG files were and was it possible to re-create them to re-create a dummy “satellite” service? After all, at the end of the day it seemed like it was just a stream of bytes that came in over a serial connection. I had so many questions: Was this some sort of binary stream with things like variable values? Binary blobs? Were the .BAG files like .ZIP archives that contained different file inside like a mix of usenet, Fido, and shareware programs?  Just ASCII text? Was it just RFC-822 messages in a stream?  Information was scarce on it but after spending quite a while searching around I finally started uncovering information.

One of the first things I found was this post to news.misc by Norman Gillaspie from PageSat in 1993. Here he goes into detail on how their satellite feeds work. Here he mentions “These files are written … with a *.bag (for mailbag) file extension”. This got me wondering if these were just Unix mbox style files that contained usenet articles.

A while later a very enlightening thing I found was from The Unix Heritage Society mailing list from 2018, when somebody else asked the same question “Does anyone know why UUCP “bag” files are called “bag”? The interesting part is when they asked one of the authors of the HoneyDanBer UUCP software and about “bag” they said they had never heard of it, so this was clearly coined somewhere else.

The thread went on to mention it was associated with the DNews software, which was popular for “suck” newsfeeds. (As opposed to IHAVE feeds where a NNTP server offers up all articles to a peer, a suck feed requests articles and stores them on the local news server saving bandwidth of taking a full feed.) For outbound feeds, DNews offered the ability to write batches of articles to a “bag file” or “rnews uucp bag” as of version 2.7. The documentation has a section called “writing uucp bag files“. The format of the .BAG file is even mentioned on the mailing list thread:

    The BAG/UUCP file format is:

    #! rnews nnnn
    ...(article, exactly nnnn bytes, counting each end of line as one byte)
    #! rnews nnnn
    ...(next article)...

This all kind of makes sense, if you’re a satellite data company and need to send files over the air, why invent some new binary file format? It’s the 90s and we haven’t gotten around to building over-complicated tech cathedrals, just do the simplest thing possible. Take the output from your news software as if it were sending to a UUCP site and shove it up to the satellite.

Admittedly as the thread ends and my own research concludes it’s still not clear how the term “bag” exactly originated. I have a hunch it was probably the satellite biz (maybe PageSat?) because news admins appear to have already had their own term for batching (“rnews batch”)? I guess somebody just called bag that and it stuck. I dug through the earliest release notes I could find for DNews and couldn’t find any further information about how their bag support came about.

File naming

I came across this old web page “Building a Satellite-News-Feed (UseNet, PlanetC)” by Juergen Helbing who goes into detail of his woe trying to get Planet Connect going in Germany. He talks about using DNews to import news from Planet Connect to his news server. There were several very interesting bits of information in this post. One was the file names used by Planet Connect:

PlanetC send out two different names of BAG-Files:

news####.zip   (NewsGroups) and
pcbin###.zip   (binary Groups)

This indicates there were different filenames based on types of content. Here news####.zip were zipped .BAG files containing batches text articles, and pcbin###.zip were zipped .BAG files containing batches of text-encoded binary groups. This was written in 1997 and he mentions Planet Connect was sending down about 400 MB of compressed articles a day, which uncompressed was about 1.5 GB per day.

Also interesting was mentioned how the Planet Connect terminal/receiver actually worked. Once the satellite, LNB, decoder, software were all installed, the software instantly starts reading the stream, had enough information to know when one file stopped and started, and just started writing files to disk. I think of this as analogous to a Zmodem/Ymodem-batch download, both of which are protocols that have headers that carry the filenames+sizes. Except it’s a one way transmission with no acknowledgements, and if there are CRC errors (he mentions this) that file is screwed as there’s no way to re-request bad blocks. I wondered if Planet Connect and other services re-broadcast files in case they were corrupt, reading an old FAQ indicates they sent some files at least twice a day but not the newsfeed itself.

Creating a .BAG with INN and send-uucp

I’ve been running the INN news server software in conjunction with my BBS to send test private newsgroup posts over UUCP. I wanted to look closer at the files being sent to see if they used this same batch format. Could a batch outfeed file from INN be treated as a .BAG file to the BBS?

First thing I discovered was my INN wasn’t set up correctly. I needed to be running send-uucp every hour on my news server, to well, batch up articles for an outbound new feed to the BBS over UUCP. This explains why BBS->INN posts worked, but I wasn’t seeing anything going INN->BBS! I also discovered I should have been filtering the news feed on the bang path, so on the first run of send-uucp I was back-feeding thousands of test posts from the BBS back to itself!

But sure enough after running send-uucp, I went and looked in my /var/spool/uucp/tuxedocatbbs/ directory for stuff spooled for the BBS. The resulting data file D.007N contained the same sort of rnews formatting mentioned from the TUHS list and DNews docs.


#! rnews 506
Path: news.wann.net!.POSTED.localhost!localhost!bwann
From: bwann@wann.net
Newsgroups: tuxedocat.test
Subject: Test 3/14 post
Date: Fri, 14 Mar 2025 14:56:29 -0700
Organization: wann.net
Message-ID: <c5f44a0f-6a69-3235-e9f4-5eb692d8d52d@wann.net>
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset=US-ASCII
Injection-Info: uucp.wann.net; posting-host="localhost:::1";
	logging-data="168550"; mail-complaints-to="usenet@uucp.wann.net"
Xref: news.wann.net tuxedocat.test:10874

happy pi day!
#! rnews 505
Path: news.wann.net!.POSTED.localhost!localhost!bwann
From: bwann@wann.net
Newsgroups: tuxedocat.test
Subject: 3/14 test again
Date: Fri, 14 Mar 2025 14:56:54 -0700
Organization: wann.net
Message-ID: <80d63cde-8ccf-b661-fc50-3447d089ea70@wann.net>
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset=US-ASCII
Injection-Info: uucp.wann.net; posting-host="localhost:::1";
	logging-data="168697"; mail-complaints-to="usenet@uucp.wann.net"
Xref: news.wann.net tuxedocat.test:10875

3.14!  meow

Fake .BAG import to BBS

I wanted to test my theory by seeing if I could just take this data file from the UUCP spool and import it to the BBS as a fake .BAG file. Instead of beaming the file up and down over satellite, I just scp’d it to the BBS system and renamed it.

wcGate is the software used by Wildcat! BBS to import/export messages from UUCP to the BBS message base, and includes support for satellite feeds. I created a new dummy UUCP provider called FAKESAT:

Creating dummy satellite service in MakeGate

 

I copied the raw data file over to the BBS into the incoming Satellite directory configured in MakeGate:

Next I ran wcgate import uucp h:fakesat to try to import my newly created .BAG file:

wcGate import of .BAG file

And it imports my fakesy .BAG file with test newsgroup posts to the BBS! Success!

 

Also good to know wcGate understands duplicate newsgroup posts, so I can go hog wild and set up multiple news feeds and not worry about the same article popping up repeatedly.

wcGate handling duplicate newsgroup posts

For what it’s worth when wcGate exports messages to transmit over UUCP, it uses a similar batch format except the batch is compressed as noted by the #! cunbatch header:

wcGate compressed cunbatch file

Other file types

I haven’t seen the Planet Connect or PageSat decoder software in operation, but presumably there are other file name series for other thing such as FidoNet posts, stock quotes, and whatever else they sent across the link.

A friend showed me a video that had a clip of a consumer service called SkyLink (offered by the same people at Planet Connect) in operation, which gives some idea of how file downloads from satellite could work. Apparently they had a manifest of files they sent in advance (daily? weekly?) of what would be sent over satellite. Using their software you would mark which files you were interested in, and as they came in throughout the day they would be saved to disk. The example video showed a list of .ZIP, .ARC, and .QWK files among others that could be saved. At the bottom shows a status pane with filenames, block and error counts:

According to an article about Planet Connect in Boardwatch Magazine (Jan 1994), Fidonet files were named `0000FFFF.M01` and required a TICK processor to import them.

It seems like it should be straightforward, if not at least possible, to mock up a dummy satellite service over a null modem serial connection to relive the experience. There needs to be something to assembled the file manifest to periodically “upload”, and then something sitting on the BBS/receiver side sitting there continually decoding as they come through and write them back to individual files. Again, something very similar to Ymodem-batch or Zmodem. This could be used to send a dummy newsfeed, or dummy stock quotes, weather images, or cat memes.

Bonus: while reading up on usenet and uucp, I learned that in 1983 when Australia first joined usenet articles were written to tape and FLOWN to the University of Sydney where they were ingested and distributed. This is also mentioned in the end of the O’Reilly Managing Usenet (1998) book under “Last-Resort Transmission Methods”.

The other night I was flipping through an old 1995 Computer Shopper, like you do, and wondered what the largest hard drive for sale at the time was.

Turns out it’s a Seagate ST410800N, 9 gigabyte, 5.25″ full height, SCSI drive. In other words the biggest physical form factor PC drive. Virtually all other new hard drives around this time were 3.5″ either in the half-height or now familiar 1″ height format, but this dude was 5.25″ FH. I do have experience with this form factor, such as the classic 5 megabyte ST-506 MFM and the 10 megabyte models that came in some of the IBM XTs. This one just stored 900 million times the amount of data.

Hard Drives International 1995

I kind of wanted a SCSI drive for a Novell NetWare server I had been considering building, so I could finally learn more about old SCSI. After seeing this, why shouldn’t I put the biggest, highest capacity SCSI drive in my file server? Off to eBay I went and found a really good deal on one and bought it.

The next day the eBay seller actually called me up and asked if I intended on using this or just scrap it for precious metals. I can’t imagine there’s that much rare metal in these but I don’t know. I told him I wanted to take a gamble on trying to use it, he said “alright I’ll do a good job packing it for you then.”


Sure enough, a big box arrives containing packing and another box, which contained more packing and the hard drive.

I finally got around to dissecting one of my 486s to test out the drive on an Adaptec 2840VL local bus SCSI card, and hooked it all up on my coffee table.

I wasn’t sure if it was actually going to work but I knew it would probably be a loud drive so recorded video and flipped the switch to turn it all on. Sure enough it took several seconds for the drive to get up to operating speed, a loud metallic ping of the heads releasing and then the noise of seeking.

Adaptec AHA-2840VL and ST4100800N

Miraculously, the previously untested Adaptec AH-4840VL SCSI controller just worked without needing any jiggling or finagling of my touchy VL-bus slots, and the hard drive appeared on the SCSI bus.

Media verify

From the Adaptec SCSISelect utility I started a media verify test to see how much damage the drive had. This ran for a few hours and either had zero bad sectors or they were quietly reallocated and the entire drive was still usable!

I installed MS-DOS on it, the fdisk and formatting had no problems with it being a 9 GB drive. Fdisk just created a 2 GB primary partition and MS-DOS used all of that as C:. Installed Norton Utilities and SpinRite on it too and let them run for a while, they all seemed happy with the drive too.

As far as drive performance, SpinRite tells me it’s doing about 3 megabyte/second transfers. I’m not yet an expert at SCSI but the SCSI-1 spec is 5 MB/sec, so I’d expect something around there, especially with the VL-bus SCSI adapter. I haven’t figured it out yet, may be a termination or cable problem maybe.

What I figured out the next day was that the PCB should have been on top, so I had been running this thing upside down all night long. All previous drives I’ve used like this such as the ST-506 and XT drive, the PCB was on the bottom. After turning it right side up it went squirrelly a few boots. A couple of times the Adaptec complained the host adapter wasn’t found, and a few times MS-DOS said the drive was read-only. Eventually whatever was stuck or misaligned fixed itself and it’s been running fine in the correct orientation ever since.

I had shot some video of the formatting and verification, and threw together a short video of the drive powering on and uploaded to Instagram reels.

Popularity

What I wasn’t expecting was the IG reel to take off like wildfire in popularity. Most of my reels have a couple dozen views, this thing suddenly got thousands. The likes and comments started pouring in, soon I had 100k views. People calling each other names. Then 250k, then 500k. Now I’m over 700,00 views, nearly a thousand comments, and 28k likes. Clearly the sound of this thing spinning up got people’s attention!

Looking through the comments, the breakdown seems to be something like this:

  • 80% posting a comment about how it sounds like an air raid or tornado siren, or the THX logo.
  • 15% can’t believe old tech used to be so loud/big/low capacity/expensive (I mean this was released 31 years ago), or maybe they haven’t been around long enough to see that today’s storage will be likely obsolete too.
  • 2% bro why don’t you just buy a 2 TB flash card for $20
  • Several people commented this exact same drive was used in Avid Media rigs they used for non-linear video editing and rendering. Sizes seemed to range from a couple of drives to an entire rack of 42 drives on multiple SCSI controllers.
  • At least three people said it was a fake or said it’s clearly an IDE drive dummy, despite the bigass 50-pin ribbon cable and the “Adaptec SCSI” screen.
  • At least one person stole the video, cropped it, and re-posted it to Threads as their own video.

The Avid comments were interesting. I tried to find photos of mid-1990s Avid setups but came up short.

The drive is still pretty noisy even inside a case so I’m not sure how well that’s going to work for a full time NetWare server. It might have to sit around for special occasions.

The video:

 

View this post on Instagram

 

A post shared by Bryan (@bryanwann)

(jesus that’s a lot of css for an embed, IG)

If you find yourself with some Cisco IR809G and probably other Cisco IR800-series industrial integrated services routers from eBay, you probably find yourself needing the DC power plug/connector. It took me a while to find these as the Cisco part number is useless in the 3rd party world.

Cisco IR809G power terminals

Mouser lists them as “Pluggable Terminal Blocks 3.81mm euro plug 4 position” made by Molex, Mouser P/N 538-39519-0007. Which is where I got mine for a $2.38 as of 2024. I guess the pin pitch is the important thing. I was only familiar with 5/5.08mm Phoenix connectors where there was no septum thingy between pins, whereas 3.5/3.81mm the pins are enclosed individually. Also TIL Phoenix and Europlug and Euroblock connectors are all the same thing depending who you ask and who makes them.

Radwell lists them as the 29-6115-01-A0 for $16, ooof. Which is still better than the dude I saw selling them on eBay for $45.00 each!

I’ve been tinkering around with other vintage projects the last several months. Many of the photos land over on Flickr but I’m pretty sure the googles never index them, so they’re largely undiscovered and I haven’t written widely about them and why I care. So here’s my “life story before the recipe” list.

 

Logitech ScanMan 32

Trying out the ScanMan 32 on my Macintosh IIsi

[photos – flickr: 2024-10 Logitech Scanman32]

This is a hand-held greyscale (32 shades of grey!) scanner that came out in 1992-ish. You had to move it down a page by hand, could only scan about 5″ wide, and only up to 400 DPI. Dad bought one for some reason and I don’t really remember the reason, either he wanted to OCR scan books or notes and it didn’t work like he expected, or he just saw it at Sam’s and wanted it to tinker with. At the time we had a 286 computer with a CGA display and while it worked with the computer, it didn’t work all that well. OCR was done by a DOS program called Catchword. It was the first time I had ever used OCR and it seemed somewhat magical, it still left a lot of hand editing to clean up text.

Graphics and photo scanning was rough, at 300-400 DPI this was greatly more resolution than what the CGA monitor could display. Something just an inch or two wide could easily spill off the side of the screen, and it was a lot of work to attempt to stitch together a single page. Eventually a few years later when I had a faster computer with a VGA monitor I put it to use scanning various clipart and had my own little library of .PCX files I made computer catalogs with. It was quite satisfying to embed a filename in WordPerfect 5.1 for DOS (this was before WYSIWYG) and see a rudimentary photo come off the dot matrix printer.

 

A couple of summers ago I found the thing in a box in the barn where it had been sitting probably for 20 years. It looked like it was in good condition so I was curious if it still worked. It didn’t have the ISA interface card so I wound up eBaying one. Eventually this year I finally got around to trying to hook it up to my 486. I installed an old DOS version of Logitech GrayTouch 1.0 and discovered the thing indeed still worked!

I quickly realized old versions of the software that went with this are missing and hard to find. The only version of GrayTouch I could find was on archive.org, the Dutch version at that. From my old tape backups I had GrayTouch 2.0 but not the whole installer set. I found somebody on eBay who was selling it — at $15 a disk and it was a 3-4 disk installation.

Settings for the ISA card are almost lost too. None of the manuals are on Archive.org. I found one page “Trevor’s Unofficial Q&A Page – Logitech Software FAQs” on archive.org that had the DIP switch settings for the ISA interface card. These set the I/O base address and the software has to know about it.

Logitech Scanman Plus ISA Controller Board 200074 DIP switches

Scanman Plus ISA DIP switches board 200074

While in the process of researching how to get this thing going again, I found out there was a Macintosh version of it. Specifically, a SCSI interface box that hooked up to the same ScanMan 32 and let you plug it into the SCSI port of a classic Macintosh. Of course now that I have a SE and IIsi, I had to try it out. I found the box, “H7M-1” on eBay for $20 and decided to try it out. It also included another ScanMan 32 and Mac manuals. So now I have two of these damn things.

The Mac software was a little easier to find and was fairly straightforward getting running.

I have manuals for Logitech PaintShow Plus, ScanMan Mac, and ScanMan Plus, I need to get them fixed up and uploaded to archive.org. One looks like it was wet, but otherwise legible. If I run across a good version of the DOS manuals and software, I for sure want to nab it to scan and archive.

I’ve pondered about making a video of this scanner in action on PC and Mac, there’s not many about it. Incidentally Cathode Ray Dude came out with a video about scanners and briefly mentions the Logitech ScanMan, so maybe that’s all the coverage it needs.

Harris TS22ALO butt-set

[photos: flickr – 2024-10 Harris TS22ALO repair]

Untitled

Ever since the ISP days I wanted a butt-set to test phone connections with. They were a few hundred dollars so I never bought one and instead carried around a $12 princess phone. I recently decided I’m an adult and I can buy one if I want to! I bought this on eBay for like $25, “not working, parts only”. Doing a little reading revealed these things have two batteries. One is the normal 9 volt battery that provides working voltage, another is a CR2032 battery that serves internal functions and if it dies the unit is inoperable.

There’s like one dude on YouTube that has videos of repairing these things and he completely skips over the part of how to actually take them apart. It’s very much draw the rest of the fucking owl. I took several photos along the way so that’ll have to do in lieu of my own teardown video.

The goal is to open the thing up and replace the CR2023 battery that’s inside it. The problem is these handsets are designed to be dropped off a 20′ telephone pole into a lake and survive, so they’re very ruggedized. The entire PCB and all the components, including the battery, are coated in this thick, goopy, rubber-ish plastic coating that’s a complete pain in the ass to get off. It looks like hot glue and you can take little nibbles with needle nose pliers, but it’s not hot glue and you can’t melt it. Cutting it dulls blades pretty fast too. I don’t know if there’s anything like acetone or gasoline that might dissolve it. If only YouTube repair guy would tell us his secrets.

I eventually starting slicing around the battery with a box cutter, very much cutting away from my fingers. I would get a slab of it and start twisting it over my needlenose pliers and eventually was able to peel away the stuff chunk by chunk to get the battery exposed.

The battery holder is spot welded directly to the battery so it’ll have to be replaced. There’s three points of contact with the PCB, two posts on the edge (positive), and one underneath (negative).

Negative lug cut on battery

I bought some cheap CR2032 enclosures from Amazon, wired up a jumper so both positive contacts on the PCB made connection plus the negative.

Put it all back together and it works!  The audio is kind of scratchy, I don’t know if parts have drifted out of spec or it was always like this. But it sure is loud and won’t have any problems hearing it in a machine room!

DHCP OFFER with both lpxelinux.0 and grubx64.efi boot-file-names

TIL a DHCPv4 server can respond with two different TFTP boot-file-names in a single DHCPOFFER packet. And how the second filename can get corrupted with extra junk that shows up as a PXE client trying to download a slightly wrong file from your TFTP server.

TFTP request with 0xFF at the end of the filename

The latter I’ve seen before but I don’t think I actually dug into trying to figure it out. Again, more interesting stuff I’ve uncovered switching from ISC DHCP to ISC Kea. Here I will try to explain where the mangled TFTP filename came from and how to avoid it.

I was trying a DHCPv4 server configuration to support both UEFI PXE clients and some old legacy BIOS-based motherboards. In old ISC DHCP this is usually done with a class to match on the vendor class or the processor architecture (code 93). If it’s 0x00 0x07, return in the DHCP OFFER a file-name of a UEFI network boot program such as syslinux.efi or bootx64.efi, else return a file-name of something like lpxelinux.0:

# ISC DHCP
class "pxeclients" {
  match if substring (option vendor-class-identifier, 0, 9) = "PXEClient";

  if option arch = 00:07 {
    filename "/efi64/syslinux.efi";
  } else {
    # PXELINUX >= 5.X is the new hotness with HTTP/FTP
    filename "/bios/lpxelinux.0";
  }
}

I was trying to do this same thing over in ISC Kea using a client-class:

  "Dhcp4": {
  ...
    "boot-file-name": "/bios/lpxelinux.0",
    "next-server": "192.168.130.10",
  ...
    "client-classes": [
      {
        "name": "grubx64_efi",
        "test": "option[61].hex == 0x0007",
        "option-data": [
          {
            "name": "boot-file-name",
            "data": "/efi/grubx64.efi"
          }
    ...
  ...

Except when I tried to UEFI PXE boot my system over IPv4, two unexpected things happened:

TFTP request with 0xFF at the end of the filename

Wireshark from the tftp server showing the request filename

First, the UEFI TFTP client was asking for a filename with extra characters (0xFF) at the end. This showed up in both syslog for the tftp server as well as a packet capture on the tftp server showing the extra 0xFF at the end. Others on the internet have mentioned other termination characters such as unicode U+FFFD. This was causing PXE booting to fail because the target system couldn’t fetch the bootloader program. In this case I’m still testing with a SuperMicro A1SAi motherboard as prior posts.

Second, when I ran packet captures to verify the filename being sent in the DHCP OFFER to make sure it wasn’t garbage, there were TWO boot filenames being returned in two different spots in the same packet! Both my /bios/lpxelinux.0 and /efi/grubx64.efi paths were being offered. wtf?

I started searching around and found these two enlightened threads on the Mikrotik forums and on the Ubiquiti forums that addressed my weird filename format. Others have seen this behavior too, and it shed some light on the problem. It comes down to if the boot file-name was included as an option (this part is key, in this case option 67) then UEFI PXE TFTP implementations expecting it to be a null-terminated string like, whereas the DHCP server terminated the field with an end-of-options flag of 0xFF. In other words, the UEFI should be respecting the data length field and terminating the string appropriately and not read too-many bytes.

Thus what I was seeing was the UEFI reading beyond the expected end of the filename, including the marker and then trying to TFTP request the file “grubx64.efi<FF>”.

This got me into reading up on the format of DHCP OFFER packets and I discovered the second issue. In RFC2131, DHCP OFFER headers have fix-length fields for “siaddr“, the “next-server” or TFTP server IP address, “sname“, an optional server hostname, and “file“, a 128-byte field that holds a boot filename. These fields are null-terminated.

RFC2131 DHCP format

HOWEVER, in RFC2132 which lays out the various DHCP options that can be specified we get to option 67. This specifies a DHCP Option “is used to identify a bootfile when the ‘file’ field in the DHCP header has been used for DHCP options.” Here the raw format is 0x67 + the length of the filename + filename. Note the lack of null termination used.

The way I read the RFC this says the TFTP filename can either be in the original DHCP OFFER header, a/k/a the “fixed fields” or specified later as an variable-length DHCP option, but not both at the same time.

This seems to be a source of a lot of confusion for people trying to troubleshoot their PXE boot configurations. It seems many like myself do not know there are two fields and keep hammering away fiddling with filenames and it’s not clear which one they’re setting.

Bonus: see below when I try to add on some dummy Option 68 data, still breaks

This got me back to reading the Kea docs again to find out what was wrong with my configuration. I caught on to the fact I was using a global “boot-file-name” and then specifying “boot-file-name” again as option 67 in my client-class.

They configuration options in Kea are literally named same thing and should be the same thing, right? RIGHT??

No, it turns out buried in 8.2.18.1 Setting Fixed Fields in Classification they are very much different. It turns out in order to set the boot-file-name set in the OFFER header, I needed to ditch the options-data and re-set “boot-file-name” again in the right scope like this:

  "Dhcp4": {
  ...
    "boot-file-name": "/bios/lpxelinux.0",
    "next-server": "192.168.130.10",
  ...
    "client-classes": [
      {
        "name": "grubx64_efi",
        "test": "option[61].hex == 0x0007",
        "boot-file-name": /efi/grubx64.efi"     <<< note not in an option-data block
      }
    ...
  ...

I guess technically if the header was full then it would make sense to call this field the same name since it should serve the same purpose.

Also for whatever reason the examples in the Kea documentation mention things like "boot-file-name": "/dev/null" which might lead you to believe this leaves the field empty. But no, it quite literally sends the string /dev/null as the filename sent to the target server in the DHCPOFFER.

Winning!

This gets us back to returning a single TFTP boot file-name in the first part of the DHCP OFFER packet, it’s null-terminated, and when the target system UEFI PXE boots, it’s requesting a valid filename. And in this case the client-class test does the right thing, it detects the target system is UEFI and sends the /efi/grubx64.efi boot-file-name instead of /bios/lpxelinux.0.  Winning!

Wireshark of DHCP OFFER with only grubx64.efi

and here’s the happy server:

>>Checking Media Presence......
>>Media Present......
>>Start PXE over IPv4.
  Station IP address is 192.168.135.28

  Server IP address is 192.168.130.10
  NBP filename is /efi/grubx64.efi
  NBP filesize is 2541096 Bytes

>>Checking Media Presence......
>>Media Present......
 Downloading NBP file...

  Succeed to download NBP file.

 

But why?

While this fixes my problem, it doesn’t address the seeming impedance mismatch between what DHCP RFCs say how the filename is specified and why UEFI seems to do its own thing by tacking on extra characters such as 0xFF. Surely these two standards groups must talk to each other?

Cracking open the UEFI 2.6 Specification, my favorite reading as of late, it’s mentioned in “Network Protocols – ARP, DHCP, DNS, HTTP and REST”. Here in EFI_DHCP4_HEADER it mentions BootFileName[128]. Then right after in EFI_DHCP4_PACKET_OPTION it clearly mentions the format of “option code + length of option data + option data”. So the format of options as mentioned in RFC2131/2132 is acknowledged here. But it really doesn’t mention line terminations, and I assume that’s left as an implementation detail.

PXE Specification doesn’t really mention line terminations either.

RFC2132 clearly states that we shouldn’t be adding our own null termination in DHCP Options. That is, we shouldn’t be trying to set boot-file-name to something like “/efi/grubx64.efi\0” in attempt to trick the UEFI into using the “correct” filename.

Options containing NVT ASCII data SHOULD NOT include a trailing NULL; however, the receiver of such options MUST be prepared to delete trailing nulls if they exist. The receiver MUST NOT require that a trailing null be included in the data. In the case of some variable-length options the length field is a constant but must still be specified.

The open source UEFI reference implementation, Tianocore EDK II, takes the stance RFC2132 says it’s not guaranteed to be null terminated, which seems to conflict with this paragraph that says the option shouldn’t ever be null terminated to begin with. In any case, they take the boot-file-name from the DHCP OFFER and if it’s Option 67 they use the length of the string to null-terminate it, else if it’s the fixed-field just use it directly: (NetworkPkg/UefiPxeBcDxe/PxeBcDhcp4.c)

  //
  // Parse PXE boot file name:
  // According to PXE spec, boot file name should be read from DHCP option 67 (bootfile name) if present.
  // Otherwise, read from boot file field in DHCP header.
  //
  if (Options[PXEBC_DHCP4_TAG_INDEX_BOOTFILE] != NULL) {
    //
    // RFC 2132, Section 9.5 does not strictly state Bootfile name (option 67) is null
    // terminated string. So force to append null terminated character at the end of string.
    //
    Ptr8  =  (UINT8 *)&Options[PXEBC_DHCP4_TAG_INDEX_BOOTFILE]->Data[0];
    Ptr8 += Options[PXEBC_DHCP4_TAG_INDEX_BOOTFILE]->Length;
    if (*(Ptr8 - 1) != '\0') {
      *Ptr8 = '\0';
    }
  } else if (!FileFieldOverloaded && (Offer->Dhcp4.Header.BootFileName[0] != 0)) {
    //
    // If the bootfile is not present and bootfilename is present in DHCPv4 packet, just parse it.
    // Do not count dhcp option header here, or else will destroy the serverhostname.
    //
    Options[PXEBC_DHCP4_TAG_INDEX_BOOTFILE] = (EFI_DHCP4_PACKET_OPTION *)
                                              (&Offer->Dhcp4.Header.BootFileName[0] -
                                               OFFSET_OF (EFI_DHCP4_PACKET_OPTION, Data[0]));
  }

 

So at least their implementation does the right thing as far as we’re concerned, and not feeding tailing characters to the TFTP server.

I know this isn’t the case of the UEFI on my SuperMicro test motherboard. If there’s a boot file name in option 67, it’s gonna get screwed up.

Notably this doesn’t seem to be a problem with DHCPv6 and PXE booting as DHCPv6 doesn’t use the same sort of fixed fields in DHCP ADVERTISE messages.

Further testing overruns with option 68

12/9: I wondered what happened if I added yet another option to my OFFERs that was right after Option 67. Would the UEFI loader figure out where to stop trying to read option 67, or would it keep reading beyond the end of the field? I configured Kea to send option 68, for “Mobile IP Home Agent”. The name and purpose doesn’t matter, I just wanted the next numerical option so the data would be adjacent in the packet.

Here’s what the new OFFER looks like with some dummy option 68 data:

DHCP OFFER with filename, Option 67, and Option 68

and here’s the hex representation of it in the packet:

Hex payload of Option 67 and 68

We have 0x43 (Decimal 67 for option 67), length 12, “testfilename”. Then immediately after we have 0x44 (Decimal 68 for option 68), length 4, followed by bytes of an IPv4 address c0-01-02-03 (192.1.2.3), and finally our 0xFF end terminator.

What does the Supermicro UEFI TFTP client do? It surprisingly reads beyond the end of option 67 and keeps going and using option 68 data as the TFTP boot-file-name! All the way to the end of the DHCP packet again, including the 0xFF terminator.

UEFI reading both option 67 and 68 data for boot-file-name!

This shows up in the TFTP server log as the original “testfilename” and then the ASCII representation of the option 68 data.

Conclusions and workarounds

The TFTP filename getting stuff appended to the end seems to be yet another UEFI implementation bug as others on the internets claim. It would seem if you’re having this problem, your best bet is to avoid using DHCP Option 67 and work to configure your DHCP server so your boot-file option is being set in the DHCP OFFER header directly. In ISC DHCP this seems to be the plain “filename” directive. In ISC Kea, it’s the top-level “boot-file-name” as mentioned above. In dnsmasq (I haven’t personally tested this) it seems to be the “dhcp-boot” directive.

The Windows DHCP server seems to be a big source of confusion. Practically every example I find for Windows Deployment Services says to use Option 67, I’m not even sure if there’s a way to set the field in the header. I don’t have a Windows server handy to look at for reference.

The only advantage I can see to using Option 67 over the fixed-field name is that the fixed-field name is limited to 127 bytes, whereas Option 67 allows up to 255 bytes.

Another option is to UEFI PXE boot over IPv6 which avoids this problem altogether.

There’s certainly some clever workarounds out there such as making symlinks on the TFTP server so that for example “grubx64.efi<FF>” links to “grub64.efi”. While that may work it seems too hackish even for me.

There may be a possibility of other UEFI things out there that need to chain boot and explicitly want Option 67. I don’t know offhand what those could be, but anyone can do anything in software.

Links

  • https://forum.mikrotik.com/viewtopic.php?t=58039
  • https://community.ui.com/questions/Network-Boot-adding-characters-to-file-name/cffe7862-dbc7-42e8-bb09-1ef3366fef9c
  • EDK II reference: https://github.com/tianocore/edk2/blob/master/NetworkPkg/UefiPxeBcDxe/PxeBcDhcp4.c
  • UEFI 2.6 Specification: https://uefi.org/sites/default/files/resources/UEFI%20Spec%202_6.pdf

										
				

Years behind schedule I finally got around to replacing ISC DHCP with Kea DHCP so I could finally have proper IPv6 host reservations. What I just learned, and should have learned years ago, that several of my motherboards such as the Supermicro A1SAi and Intel NUC while they support UEFI PXE booting, they do not support TFTP servers outside of their local /64 network. Doh! They will happily get an address via DHCPv6 on a DHCPv6 server on another network via a relay, that’s not a problem, but if the TFTP server is not on the same LAN the NBP download process times out and fails. It would seem that Linkedin learned this years ago too. This is similar in effect to my misconfigured DHCP server the other day, but not the same cause.

The only solution is to either have a TFTP server on the same LAN as the target system, or keeping around legacy IPv4 networking so that the target system can use UEFI IPV4 PXE to boot something like syslinux.efi, or GRUB2, or iPXE, which in turn has IPv6 support, and can finish downloading the kernel and initramfs over IPv6.

At first I thought I was doing something wrong in Kea (and I verified this with the old ISC DHCP), but no, packet captures prove that during UEFI PXE boot the system is making zero effort to send out Router Solicitations. It also tries to do Neighbor Discovery for IPv6 addresses that it should be sending to the default gateway, which implies it’s not honoring Router Advertisements that tell the system its prefix and prefix length. Or, it has some wild ideas as what it thinks are “on-link”, which is how IPv6 determines if something is on the same L2 network.

An example

Here’s a target system, Supermicro A1SAi-2550 with MAC address 0c:c4:7a:32:27:6, trying to UEFI PXE boot over IPv6:

First, the Kea DHCP6 server configuration, just says here your IP address is 2001:470:8122:1::9, and go fetch grub2 using tftp at 2001:470:1f05:2c9::10:

    "client-classes": [
      {
        "name": "grub2_tftp_efi",
        "test": "option[61].hex == 0x0007",
        "option-data": [
          {
            "name": "bootfile-url",
            "data": "tftp://[2001:470:1f05:2c9::10]/efi/bootx64.efi"
          }
    ...
    ...
    "subnet6": [
    ...
    ...
    "hostname": "basic09.wann.net",
                  "hw-address": "0c:c4:7a:32:27:6c",
                  "ip-addresses": [ "2001:470:8122:1::9" ],
                  "client-classes": [ "ikeacluster" ]
    ...

On boot, this is displayed on console:

>>Checking Media Presence......
>>Media Present......
>>Start PXE over IPv6..
  Station IP address is 2001:470:8122:1:0:0:0:9

  ....long 20 second wait...

  Server IP address is 2001:470:1F05:2C9:0:0:0:10
  NBP filename is efi/bootx64.efi
  NBP filesize is 0 Bytes
  PXE-E18: Server response timeout.

This tells us the target system did a successful DHCPv6 Solicit/Advertise/Request/Reply (S.A.R.R.) to Kea, it understood the bootname-url option in the DHCP6 response. But then got zero bytes.

From the standpoint of the DHCPv6 and TFTP servers, there’s not much to see. The SARR process happens, and that’s it. Nothing tries to hit the tftp server at all.

From a packet capture of the router (:89:f0) facing the Supermicro system (:27:6c) we see:

Solicit XID: 0xe8a7e3 CID: 000100013875b6020cc47a32276c                               ok
Advertise XID: 0xe8a7e3 CID: 000100013875b6020cc47a32276c IAA: 2001:470:8122:1::9     ok
Request XID: Oxe9a7e3 CID: 000100013875b6020cc47a32276c IAA: 2001:470:8122:1: :9      ok
Reply XID: 0xe9a7e3 CID: 000100013875b6020cc47a32276c IAA: 2001:470:8122:1::9         ok
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:6c                what!
Neighbor Solicitation for fe80::ec4:7aff:fe32:276c from fc:ec:da:4a:89:f0             < router :f0 asks who :6c is
Neighbor Advertisement fe80::ec4:7aff:fe32:276c (sol, ovr) is at 0c:c4:7a:32:27:6c    < :6c replies
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:6c                what!
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60                what!
Neighbor Solicitation for 2001:470:1f05:2c9::10 from 0c: c4:7a:32:27:6c
Neighbor Solicitation for fe80::feec:daff:fe4a:89f0 from Oc:c4:7a:32:27:6c            < :6c unicast-asks who's :89:f0
Neighbor Advertisement fe80::feec:daff:fe4a:89f0 (rtr, sol)                           < :f0 replies I am he, also I'm a router
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from 0c:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from 0c:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:C4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:C4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60
Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60
Release XID: Oxeaa7e3 CID: 000100013875b6020cc47a32276c IAA: 2001:470:8122:1::9    < :6c I give up
Reply XID: Oxeaa7e3 CID: 000100013875b6020cc47a32276c

We see the Supermicro go through the whole SARR process. DHCP6 by design does not carry any router details or subnet/prefix information design. It’s up to the target system to listen for Router Advertisements to find the prefix of the associated subnet of the LAN. In other words, the Supermicro assumes it is 2001:470:8122:1::9/128 until something tells it otherwise. Here the Supermicro did not make any sort of Router Solicitation. I’ve filtered it out for brevity, but the router was indeed sending out RAs every 4 seconds so it had ample time and had at least 5 go by in this time frame.

My hypothesis is that maybe the IP stack did receive an RA but just decided everything is “on-link” anyways? Or has a wildly wrong prefix misconfigured and thinks everything in the world is on the same network. For giggles I did try to fetch from a Comcast 2601:646:: network so it’s in at least a different /14, didn’t help. In any event, the Supermicro starts sending out Neighbor Discovery requests for the TFTP server at 2001:470:1f05:2c9::10 over and over, which is a completely different subnet on a completely different LAN.

It tries this for many seconds and eventually gives up. It’s nice enough to release the DHCPv6 lease before it returns to the boot menu.

How to fix?

I don’t know if there is a fix for this, at least one available to me. I’ve already tried upgrading the Supermicro BIOS which jumped it way ahead from a 2014 vintage to 2019. I’m sure Supermicro’s solution is “buy something newer”.

In the meantime I’m going to go back to booting GRUB2 over IPv4 and be mad about it.

A peek inside PXE – TianoCore EDK II

Googling for anything related to PXE booting is futile. Pages and pages of people way off the mark and no real definitive information. The UEFI 2.1 and 2.7 Implementation specifications are useful, they go into a lot of detail as to what should happen, but it’s up to others to actually write the code. Somehow I did stumble upon TianoCore EDK II, (EFI Development Kit II) from what I gather was Intel’s original EFI reference code that was open sourced and now has grown into its own reference UEFI codebase. TianoCore is the community, EDK II is the reference implementation.

I have no idea if Supermicro’s UEFI code is based off of EDK, it seems fairly sorta similar from what little I can see at least. Maybe not, because EDK supports UEFI HTTP boot and my Supermicro doesn’t. I give the Intel NUC more possibility that it could be using code from the same pedigree.

EDK II code is a fascinating read, especially the NetworkPkg/UefiPxeBcDxe code that shows an actual PXE implementation, “Start PXE over IPv6” and all. It answers a few questions, such as what’s the real format for bootfile-url options in DHCP (tftp://ip.address./path/path/file), or why the leading / slash gets chopped off paths, or the variety of code paths that get you to different PXE-Exx error codes.

Another cute thing I learned from the EDK code, and I’ve seen it on the NUC, is that every dot it prints after “Start PXE over IPv6” means the stack has sent a packet on the network.

[photos: flickr – Vintage dial-up modem teardowns]

[photos: flickr – Analog telephone adapters]

For several months I’ve been buying old popular models of dial-up modems from the 1990s to test how they fare over VoIP connections along with different analog telephone adapters. To my great annoyance maybe a quarter of them didn’t include an AC power adapter, so I had to do a bunch of sleuthing to figure out if the modem took AC or DC power, the voltage, the expected amperage, what type and size of power connector. What worked for one model is no guarantee it works for another similar one.

USR Courier I-modem AC transformer guts

For instance even between my USR Courier V.Everything modems, models 1868, 2806, 3453C, they came with AC step-down transformers that output 20 VAC, 9 VAC, or 15 VAC. The USR Courier I-modem AC adapter claims it has a 20 VAC output, but after getting weird output measurements on the pins, I cut open the the impossible-to-find AC transformer to find it has a diode which seems to imply it’s outputting half-wave rectified DC-ish power and a much easier to find DC-only supply might work.

It looks like Retro Web doesn’t allow for documentation of external devices like modems, there’s no good collection of this information that I’m aware of. To help future generations avoid this problem, I started photographing and noting the details of every power supply in my collection. And for history’s sake I decided to open up the modems and make high-quality-ish photos of them too. Hopefully this will let people find cheap replacements for modems they buy or in the case of the Courier I-modem, find a workaround replacement because they are very rare.

At least one, such as the first gen USR Courier I-modem, had leaking electrolytic capacitors so I’ve taken extra photos of the caps to get size information. Unfortunately I am not yet an expert on circuit design, DSPs, and ROMs, so I don’t have much illuminating commentary or stories to tell about these modems.

For now I have all the teardown photos in a single, large Flickr album, organized by modem name/model.

I haven’t decided how I want to organize these, if I want to put together a modem wiki over on Tuxedocatbbs.com, or go for a more structured approach like Retro Web did. I have more information that goes along with them, either manuals I’ve scanned or dug up, replacement capacitor sizing, along with init strings used during my testing.

As for the testing itself, that’s a whole ‘nother post. I used Qmodem on my 486 to make thousands of calls to my BBS and do a 64 KB Ymodem download. For actually calling, handshaking, and connecting, surprisingly all of the modems have almost a 100% success rate over VoIP without any speed restrictions. Disabling V.92 quick connect is usually the only tweak I’ve had to make. However actually trying a download is where things start telling different stories and results vary widely. Preliminary test data and results are over on the BBS website: https://tuxedocatbbs.com/stats/ccr.txt

As of 11/2024 I have these modems up:

  • Cardinal 28.8k V.34 external 020-0458
  • Hayes Smartmodem Optima 9600 “Optima 96” 2003 AM
  • Hayes Smartmodem 2400
  • Hayes Smartmodem Optima 288
  • Motorola ModemSURFR 33.6
  • Motorola Premier 33.6
  • MultiTech MultiModem II MT1432BA
  • MultiTech MultiModem II MT2834BA
  • MultiTech Multimodem MT5634ZBA
  • SupraFAXmodem 144 LC
  • SupraFAXmodem 288
  • SupraFAXmodemPlus 2400
  • Telebit Netblazer PN V.32bis
  • US Robotics 56k V.90/x2 (basically Sportster)
  • US Robotics USR5637 USB
  • US Robotics Courier 56k Business Modem 3453C
  • US Robotics Courier I-modem ISDN with V.Everything
  • US Robotics Courier I-modem with ISDN/V.34
  • US Robotics Courier V.Everything 1868
  • US Robotics Courier V.Everything 2806
  • US Robotics Sportster 56k with x2
  • Viva 9600/4800 2400 bps data fax
  • Zoom VFX V.32bis

Have you seen this modem?

Wang 9648/24e

One of my very first modems was a Wang 9648/24e, a 2400bps fax/modem that I bought at Walmart around 1993. I have only found exactly one photograph of this model on the Internet.  Barely anyone seems to remember Wang, much less that Wang made modems. It wasn’t particularly good nor bad, just a pokey 2400. I even used it for years during the ISP for credit card batch processing because higher speed modems had problems connecting to the processor. I tossed mine years and years ago, but if you come across one send it to me! I thought the Viva 9600/4800 was a rebranded version but after buying one it only looks vaguely similar and is most definitely nowhere near the same thing.

Update: 9:17 PM

Literally hours after I posted these, one just sold on eBay three hours ago! I’ve been keeping an eye out for it but guess I didn’t have a saved search for it.

Update 9:20 Oh it’s actually a 9696/24e which I’ve actually never heard of and looks slightly larger, so not exactly the same, but still so close!

 

Petcube Bites 2.0 teardown

This is my second Petcube Bites and after a few years of operation it stopped dispensing treats. Treats started getting jammed between the rotating loader head thingy and the slot loading to the launcher chute and I’d have to empty it out and pick out the offending treat, only to have it jam the next time around. Using the little reducer didn’t seem to matter. The unit would growl and whirrrrr for 10-15 seconds before it timed out, it sounded like it had a stripped gear inside.

No other option other than throwing it away, I opened it up to take a look:

2024-11 Petcube Bites 2 Teardown

Go to Flickr gallery

[photos – flickr: Petcube Bites 2.0 teardown]

The unit was designed simpler than the 1st generation Petcube Bites. That one had a spring loaded flipper thing that I seem to recall just stopped working and I couldn’t fix it after I took it apart too. The 2.0 just has two motors, one for the “loader” at the top and another for the “launcher” on the side. I knew the 2.0 would launch treats with some force across my apartment, after opening it up I found out why. The launcher motor spins up at a pretty good clip the whole time while waiting for the loader to feed in treats, then turns off. The whirrrrring sound I heard seems to be the launcher motor running empty until it times out.

The first gen had sensors in the launcher chute which I assume is to tell if a treat dropped or not. One the 2nd gen both motors have a wheel that passes through opto-interrupters, which I’m wondering measures slight changes in RPM to figure out if a treat has been fed through.

Update 11/27/2024:

It’s jamming again. After scooping the treats out of the way and looking at the feeder mechanism, I think what is happening is that a treat is being plucked by the rotating head but is hanging out on the ledge of the chute. Instead of falling in it just hangs there and the motor keeps trying to crush it. I thought I’d seen the motor change direction before, but in this case it keeps twisting and twisting until it times out.

If it would just back off and reverse part of a turn to let the stuck treat drop, I think that would fix this.

 

Older Posts »