1 contributor
659 lines | 14.468kb
# SSH Infrastructure - Single Source of Truth

Last updated: 2026-05-21

This is the only project documentation file. Keep architecture, key handling,
sync/deploy steps, troubleshooting, and maintenance notes here. Do not add
separate Markdown documents for the same subject unless this README is split by
explicit decision.

## Read This First

This repository manages SSH access from Bogdan's macOS workstation to Next-Gen
company hosts through:

```text
local macOS
  -> is-jumper 192.168.2.100
  -> J1/J2 10.253.51.50/52:25904
  -> final hosts: porta, pbx, radius, voip, network gear
```

The key detail agents keep missing:

- The local machine does not hold the company hardware key.
- The physical RSA smartcard is mounted only on `is-jumper`.
- The wrapper logs into `is-jumper`, sets
  `SSH_AUTH_SOCK=/run/user/0/gnupg/S.gpg-agent.ssh`, then runs SSH from there to
  J1/J2.
- J1/J2 must use user `bogdan.timofte`.
- `is-jumper` itself must use local key `~/.ssh/keys/is-jumper_ed25519`.
- `ssh` on macOS must resolve to `~/.local/bin/ssh`, not `/usr/bin/ssh`, for
  company aliases.

Fast health check:

```bash
which ssh
ssh -G is-jumper | grep -E '^(hostname|user|identityfile|identitiesonly) '
ssh -G j1 | grep -E '^(hostname|user|port) '
ssh is-jumper hostname
ssh porta-sip hostname
```

Expected highlights:

```text
/Users/bogdan/.local/bin/ssh
identityfile ~/.ssh/keys/is-jumper_ed25519
user bogdan.timofte
p12.voip.ro
```

## Repository Rules

Project source:

```text
/Users/bogdan/Documents/Workspaces/Bogdan/ssh-infrastructure
```

Runtime OpenSSH state:

```text
~/.ssh/config
~/.ssh/known_hosts
~/.ssh/authorized_keys
~/.ssh/keys/
~/.local/bin/ssh
~/.local/bin/scp
~/.local/bin/sftp
```

Only edit source files in the repository. Do not edit generated runtime files by
hand.

Tracked source files:

```text
README.md                         this file, the only documentation
inventory/hosts.yaml              upstream/company host inventory
inventory/hosts-local.yaml        local overlay and local lab inventory
schema/hosts.schema.json          inventory schema
scripts/ssh-wrapper.sh            installed as ~/.local/bin/ssh
scripts/scp-wrapper.sh            installed as ~/.local/bin/scp
scripts/sftp-wrapper.sh           installed as ~/.local/bin/sftp
tools/generate-configs.py         config generator
tools/deploy-local.sh             local deploy
tools/sync-hosts-from-upstream.sh upstream inventory sync
tools/migrate-modern-key.sh       legacy local key migration helper
.gitignore
```

Ignored or runtime-only files:

```text
generated/
SSH_SETUP_SUMMARY.md
authorized_keys
known_hosts
known_hosts.old
keys/
agent/
conf.d/
import/
*.pem *.key *.ppk *.der *.csr
```

Git basics:

```bash
git status
git add README.md inventory schema scripts tools .gitignore
git commit -m "Describe change"
```

Known remotes:

```text
nextgen  ssh://git@192.168.2.103/home/git/repositories/bogdan/NextGen-Host-List.git
mazeri   ssh://git@192.168.2.102/home/git/repositories/bogdan/SSH-Infrastructure.git
```

## Architecture

### Network

```text
192.168.2.0/24 - local office/lab network
  is-jumper 192.168.2.100 - VPN client and hardware-key guardian
  local lab hosts

10.253.51.0/24 - internal company network reached from is-jumper VPN
  J1 10.253.51.50:25904
  J2 10.253.51.52:25904
  final hosts
```

`is-jumper` is not a VPN server. It is a local host that has VPN reachability to
the company network and has the physical smartcard mounted.

### Access Chains

Standard final-host chain:

```text
local wrapper
  -> /usr/bin/ssh is-jumper
  -> SSH_AUTH_SOCK=/run/user/0/gnupg/S.gpg-agent.ssh ssh -A J1
  -> ssh final-host
```

Interactive J1/J2 login:

```text
local wrapper -> is-jumper -> J1/J2
```

Emergency public routes:

```text
local wrapper -> is-jumper -> j1.next-gen.ro or j2.next-gen.ro
```

The wrapper strips custom flags before calling real SSH:

```text
-J1  use J1 VPN route, default
-J2  use J2 VPN route
-j1  use public j1 route
-j2  use public j2 route
```

Do not reintroduce local port forwarding, Python relays, `IdentityAgent
/tmp/...`, or helper scripts that bridge the physical-card socket to the local
machine. Those were removed for compliance and SentinelOne noise.

## Keys

### Key Matrix

| Key | Location | Purpose |
| --- | --- | --- |
| Physical smartcard RSA 4096 | only on `is-jumper` | Auth from `is-jumper` to J1/J2/company network |
| `is-jumper_ed25519` | local `~/.ssh/keys/is-jumper_ed25519` | Auth from macOS to `is-jumper` |
| Modern ED25519 | local `~/.ssh/id_ed25519` or `~/.ssh/keys/id_ed25519` | Local lab and migrated hosts |
| Legacy RSA | local `~/.ssh/keys/id_rsa_old` | Temporary migration fallback for old local hosts |

Critical config values:

```yaml
entrypoints:
  is_jumper:
    hostname: 192.168.2.100
    user: root
    identity_file: ~/.ssh/keys/is-jumper_ed25519
    identities_only: true

jumps:
  j1:
    hostname: 10.253.51.50
    user: bogdan.timofte
    port: 25904
  j2:
    hostname: 10.253.51.52
    user: bogdan.timofte
    port: 25904
```

If J1/J2 use `bogdan` instead of `bogdan.timofte`, final host SSH will fail with
an error like:

```text
bogdan@10.253.51.50: Permission denied (publickey).
Connection to 192.168.2.100 closed.
```

Fix that in `inventory/hosts-local.yaml`, deploy, then verify:

```bash
ssh -G j1 | grep -E '^(hostname|user|port) '
tools/deploy-local.sh
ssh porta-sip hostname
```

## Inventory and Generation

The generator reads:

```text
inventory/hosts.yaml
inventory/hosts-local.yaml if it exists
```

Important: the inventory merge is shallow. Later top-level maps from
`hosts-local.yaml` override upstream maps. This is useful for local lab entries
but dangerous for defaults. If `hosts-local.yaml` changes `defaults.jump.user`,
then local `jumps.j1` and `jumps.j2` must specify `user: bogdan.timofte`
explicitly.

Generated files:

```text
generated/client.conf      installed as ~/.ssh/config
generated/is-jumper.conf   server-side helper config
generated/j1.conf          server-side final-host config
generated/j2.conf          server-side final-host config
```

`generated/` is ignored by git. Recreate it any time:

```bash
python3 tools/generate-configs.py
```

Deploy local runtime:

```bash
tools/deploy-local.sh
```

Deploy does:

```text
1. run tools/generate-configs.py
2. install generated/client.conf as ~/.ssh/config
3. install scripts/ssh-wrapper.sh as ~/.local/bin/ssh
4. install scripts/scp-wrapper.sh as ~/.local/bin/scp
5. install scripts/sftp-wrapper.sh as ~/.local/bin/sftp
6. remove obsolete ~/.ssh/scripts wrapper copies
```

It does not touch private keys, `authorized_keys`, or `known_hosts`.

## Local Shell and Wrappers

For company aliases, `ssh` must be the wrapper:

```bash
which ssh
# /Users/bogdan/.local/bin/ssh
```

If it shows `/usr/bin/ssh`, fix shell PATH and reload:

```bash
source ~/.zshrc
which ssh
```

The current shell startup should keep `~/.local/bin` first in both interactive
and login shells. If editing these files, preserve this behavior:

```zsh
path=("$HOME/.local/bin" ${path:#"$HOME/.local/bin"})
export PATH
```

`ssh-wrapper.sh` uses bash 3.2 compatible array expansion under `set -u`.
Do not replace guarded forms like:

```bash
${cmd_args[@]+"${cmd_args[@]}"}
```

with plain:

```bash
"${cmd_args[@]}"
```

On macOS bash 3.2, empty arrays plus `set -u` can fail with:

```text
cmd_args[@]: unbound variable
```

## Sync from Upstream

Pull upstream `hosts.yaml`, apply the local `is-jumper` key override, validate
generation, and deploy if changed:

```bash
tools/sync-hosts-from-upstream.sh
```

Defaults:

```text
UPSTREAM_SSH_TARGET=nextgen@192.168.2.103
UPSTREAM_HOSTS_PATH=/home/nextgen/projects/ssh-infrastructure/inventory/hosts.yaml
LOCAL_IS_JUMPER_IDENTITY_FILE=~/.ssh/keys/is-jumper_ed25519
DEPLOY_AFTER_SYNC=1
FORCE_DEPLOY=0
```

Useful overrides:

```bash
UPSTREAM_HOSTS_FILE=/tmp/hosts.yaml tools/sync-hosts-from-upstream.sh
DEPLOY_AFTER_SYNC=0 tools/sync-hosts-from-upstream.sh
FORCE_DEPLOY=1 tools/sync-hosts-from-upstream.sh
UPSTREAM_SSH_TARGET=user@host tools/sync-hosts-from-upstream.sh
```

After sync, always check J1 user because the local overlay can override jump
defaults:

```bash
ssh -G j1 | grep -E '^(hostname|user|port) '
```

Expected:

```text
user bogdan.timofte
hostname 10.253.51.50
port 25904
```

## Adding or Changing Hosts

For company/Next-Gen hosts:

```text
1. Edit inventory/hosts.yaml or sync it from upstream.
2. Keep local-only corrections in inventory/hosts-local.yaml.
3. Run tools/deploy-local.sh.
4. Verify with ssh -G <alias>.
5. Verify read-only with ssh <alias> hostname.
6. Commit source changes only.
```

For local lab hosts:

```text
1. Edit inventory/hosts-local.yaml.
2. Run tools/deploy-local.sh.
3. Verify with ssh <alias> hostname.
4. Commit the local overlay change.
```

Common inventory defaults:

| Context | User | Port |
| --- | --- | --- |
| J1/J2 company jump | `bogdan.timofte` | `25904` for VPN route |
| Company final hosts | usually `bogdan` | usually `22` |
| Company inherited jump config | `bogdan.timofte` | often `24` |
| Local lab hosts | usually `bogdan` | usually `22` |
| Cisco/OLT interactive devices | inventory-specific | `22` |

For Cisco/OLT/password-interactive devices, set:

```yaml
auth: password_interactive
```

The wrapper then avoids forcing `BatchMode=yes` and disables pubkey auth for
that final hop.

## Key Migration for Local Legacy Hosts

Modern preferred key:

```text
~/.ssh/id_ed25519.pub
```

Legacy fallback key:

```text
~/.ssh/keys/id_rsa_old
```

Migrate all configured local legacy hosts:

```bash
tools/migrate-modern-key.sh
```

Migrate one host:

```bash
tools/migrate-modern-key.sh is-baobab
```

Manual fallback if password access is available:

```bash
ssh -o PubkeyAuthentication=no user@host \
  "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys" \
  < ~/.ssh/id_ed25519.pub
```

Keep `id_rsa_old` until all legacy hosts are verified with the modern key.

## Verification Checklist

Run after deploy, sync, wrapper edits, or inventory changes:

```bash
which ssh
ssh -G is-jumper | grep -E '^(hostname|user|identityfile|identitiesonly) '
ssh -G j1 | grep -E '^(hostname|user|port) '
ssh is-jumper hostname
ssh is-jumper 'SSH_AUTH_SOCK=/run/user/0/gnupg/S.gpg-agent.ssh ssh-add -L | sed -n 1p'
ssh porta-sip hostname
ssh pbx-bo hostname
```

Expected signals:

```text
which ssh                         -> /Users/bogdan/.local/bin/ssh
is-jumper hostname                -> is-vpn-gw
j1 user                           -> bogdan.timofte
physical card check               -> ssh-rsa ... cardno:6446168
porta-sip hostname                -> p12.voip.ro
pbx-bo hostname                   -> pbx-bo
```

Interactive smoke test:

```bash
printf "exit\n" | ssh porta-sip
printf "exit\n" | ssh pbx-bo
```

## Troubleshooting

### `bogdan@10.253.51.50: Permission denied (publickey)`

The wrapper reached `is-jumper`, but J1 was attempted with user `bogdan`.
J1/J2 need `bogdan.timofte`.

Check:

```bash
ssh -G j1 | grep -E '^(hostname|user|port) '
```

Fix:

```yaml
# inventory/hosts-local.yaml
jumps:
  j1:
    user: bogdan.timofte
  j2:
    user: bogdan.timofte
```

Deploy:

```bash
tools/deploy-local.sh
ssh porta-sip hostname
```

### `root@192.168.2.100: Permission denied`

The local connection to `is-jumper` is using the wrong key.

Check:

```bash
ssh -G is-jumper | grep -E '^(user|hostname|identityfile|identitiesonly) '
ls -l ~/.ssh/keys/is-jumper_ed25519
```

Expected:

```text
user root
hostname 192.168.2.100
identityfile ~/.ssh/keys/is-jumper_ed25519
identitiesonly yes
```

If generated config is wrong, fix `inventory/hosts-local.yaml` or
`inventory/hosts.yaml`, then deploy.

### `ssh pbx-bo` uses `/usr/bin/ssh`

The wrapper is not first in PATH.

Check:

```bash
which ssh
```

Fix current shell:

```bash
source ~/.zshrc
```

If needed, ensure `.zprofile` and `.zshrc` both move `~/.local/bin` to the front
using zsh `path`, not a guard that leaves it later in PATH.

### `cmd_args[@]: unbound variable`

This is a bash 3.2 plus `set -u` empty-array issue in `ssh-wrapper.sh`.

Use guarded array expansion:

```bash
${array[@]+"${array[@]}"}
```

Do not simplify it.

### Physical card missing on `is-jumper`

Check:

```bash
ssh is-jumper 'ls -l /run/user/0/gnupg/S.gpg-agent.ssh'
ssh is-jumper 'SSH_AUTH_SOCK=/run/user/0/gnupg/S.gpg-agent.ssh ssh-add -L | sed -n 1p'
```

Expected key output contains:

```text
cardno:6446168
```

If missing, the issue is on `is-jumper`: gpg-agent, card mount, permissions, or
hardware state.

### Direct command works but wrapper fails

Compare generated command behavior:

```bash
bash -x ~/.local/bin/ssh porta-sip hostname
```

Look for:

```text
SSH_AUTH_SOCK=/run/user/0/gnupg/S.gpg-agent.ssh
bogdan.timofte@10.253.51.50
```

If either is wrong, fix inventory/local overlay or wrapper.

### Generated config was edited manually

Discard manual runtime edits by redeploying:

```bash
tools/deploy-local.sh
```

Then verify:

```bash
ssh -G j1 | grep -E '^(hostname|user|port) '
```

## Compatibility and Compliance

Do not reintroduce these removed patterns:

```text
j1-relay.sh
ssh-proxy.sh
ensure-ssh-agent-bridge.sh
ensure-ssh-jump.sh
local socket forwarding for the hardware card
Python/base64 port-forwarding relays
per-host local ProxyCommand bridges
```

Current compliant model:

```text
local wrapper -> ssh is-jumper -> run normal ssh from is-jumper
```

Compatibility options for old final hosts belong in inventory or on jump hosts,
not in ad-hoc local forwarding scripts.

## Maintenance Notes for Agents

Before changing anything:

```bash
git status --short --branch
which ssh
ssh -G j1 | grep -E '^(hostname|user|port) '
```

When fixing auth:

```text
1. Identify which hop failed from the error user@host.
2. is-jumper failures mean local key/config.
3. J1/J2 failures mean hardware card, SSH_AUTH_SOCK, or jump user.
4. final-host failures mean final host user/auth/port.
5. Apply the fix in inventory or wrapper source, not generated config.
6. Run tools/deploy-local.sh.
7. Run read-only SSH verification.
8. Commit the source change.
```

Do not assume `hosts.yaml` alone is the effective config. Always remember
`inventory/hosts-local.yaml` is merged in by `tools/generate-configs.py`.

Do not trust stale docs, comments, or generated files over these commands:

```bash
ssh -G <alias>
tools/deploy-local.sh
ssh <alias> hostname
git diff
```