• Copy URL to clipboard
README

SSH Infrastructure - Single Source of Truth

Last updated: 2026-05-21

This is the only project documentation file. Keep architecture, key handling, sync/deploy steps, troubleshooting, and maintenance notes here. Do not add separate Markdown documents for the same subject unless this README is split by explicit decision.

Read This First

This repository manages SSH access from Bogdan's macOS workstation to Next-Gen company hosts through:

local macOS
  -> is-jumper 192.168.2.100
  -> J1/J2 10.253.51.50/52:25904
  -> final hosts: porta, pbx, radius, voip, network gear

The key detail agents keep missing:

  • The local machine does not hold the company hardware key.
  • The physical RSA smartcard is mounted only on is-jumper.
  • The wrapper logs into is-jumper, sets SSH_AUTH_SOCK=/run/user/0/gnupg/S.gpg-agent.ssh, then runs SSH from there to J1/J2.
  • J1/J2 must use user bogdan.timofte.
  • is-jumper itself must use local key ~/.ssh/keys/is-jumper_ed25519.
  • ssh on macOS must resolve to ~/.local/bin/ssh, not /usr/bin/ssh, for company aliases.

Fast health check:

which ssh
ssh -G is-jumper | grep -E '^(hostname|user|identityfile|identitiesonly) '
ssh -G j1 | grep -E '^(hostname|user|port) '
ssh is-jumper hostname
ssh porta-sip hostname

Expected highlights:

/Users/bogdan/.local/bin/ssh
identityfile ~/.ssh/keys/is-jumper_ed25519
user bogdan.timofte
p12.voip.ro

Sources of Truth

There are two separate host tables, with separate ownership:

Table File / Location Owner What Belongs There
Local table inventory/hosts-local.yaml in this repo Us / Bogdan local workstation Local lab hosts, local defaults, local key paths, and local overrides required for this Mac
NextGen table nextgen@192.168.2.103:/home/nextgen/projects/ssh-infrastructure/inventory/hosts.yaml NextGen / upstream Company-managed NextGen host list: porta, pbx, radius, voip, network gear, and upstream defaults

Operational rule:

inventory/hosts-local.yaml is our local source of truth.
inventory/hosts.yaml is a local copy of the NextGen upstream table.

Do not put local-only fixes into the upstream table unless they are true for NextGen as well. Keep Mac/local requirements in inventory/hosts-local.yaml.

The effective local config is generated from both files:

inventory/hosts.yaml        <- copied/synced from nextgen upstream
inventory/hosts-local.yaml  <- maintained locally by us
  -> tools/generate-configs.py
  -> generated/client.conf
  -> ~/.ssh/config

Critical local overrides currently required:

entrypoints:
  is_jumper:
    identity_file: ~/.ssh/keys/is-jumper_ed25519
    identities_only: true

jumps:
  j1:
    user: bogdan.timofte
  j2:
    user: bogdan.timofte

The sync script updates only the local copy of the upstream table:

tools/sync-hosts-from-upstream.sh

After every sync, verify the local overlay still produces the right effective config:

ssh -G j1 | grep -E '^(hostname|user|port) '
ssh -G is-jumper | grep -E '^(hostname|user|identityfile|identitiesonly) '

Expected:

user bogdan.timofte
identityfile ~/.ssh/keys/is-jumper_ed25519

Repository Rules

Project source:

/Users/bogdan/Documents/Workspaces/Bogdan/ssh-infrastructure

Runtime OpenSSH state:

~/.ssh/config
~/.ssh/known_hosts
~/.ssh/authorized_keys
~/.ssh/keys/
~/.local/bin/ssh
~/.local/bin/scp
~/.local/bin/sftp

Only edit source files in the repository. Do not edit generated runtime files by hand.

Tracked source files:

README.md                         this file, the only documentation
inventory/hosts.yaml              upstream/company host inventory
inventory/hosts-local.yaml        local overlay and local lab inventory
schema/hosts.schema.json          inventory schema
scripts/ssh-wrapper.sh            installed as ~/.local/bin/ssh
scripts/scp-wrapper.sh            installed as ~/.local/bin/scp
scripts/sftp-wrapper.sh           installed as ~/.local/bin/sftp
tools/generate-configs.py         config generator
tools/deploy-local.sh             local deploy
tools/sync-hosts-from-upstream.sh upstream inventory sync
tools/migrate-modern-key.sh       legacy local key migration helper
.gitignore

Ignored or runtime-only files:

generated/
SSH_SETUP_SUMMARY.md
authorized_keys
known_hosts
known_hosts.old
keys/
agent/
conf.d/
import/
*.pem *.key *.ppk *.der *.csr

Git basics:

git status
git add README.md inventory schema scripts tools .gitignore
git commit -m "Describe change"

Known remotes:

nextgen  ssh://git@192.168.2.103/home/git/repositories/bogdan/NextGen-Host-List.git
mazeri   ssh://git@192.168.2.102/home/git/repositories/bogdan/SSH-Infrastructure.git

Architecture

Network

192.168.2.0/24 - local office/lab network
  is-jumper 192.168.2.100 - VPN client and hardware-key guardian
  local lab hosts

10.253.51.0/24 - internal company network reached from is-jumper VPN
  J1 10.253.51.50:25904
  J2 10.253.51.52:25904
  final hosts

is-jumper is not a VPN server. It is a local host that has VPN reachability to the company network and has the physical smartcard mounted.

Access Chains

Standard final-host chain:

local wrapper
  -> /usr/bin/ssh is-jumper
  -> SSH_AUTH_SOCK=/run/user/0/gnupg/S.gpg-agent.ssh ssh -A J1
  -> ssh final-host

Interactive J1/J2 login:

local wrapper -> is-jumper -> J1/J2

Emergency public routes:

local wrapper -> is-jumper -> j1.next-gen.ro or j2.next-gen.ro

The wrapper strips custom flags before calling real SSH:

-J1  use J1 VPN route, default
-J2  use J2 VPN route
-j1  use public j1 route
-j2  use public j2 route

Do not reintroduce local port forwarding, Python relays, IdentityAgent /tmp/..., or helper scripts that bridge the physical-card socket to the local machine. Those were removed for compliance and SentinelOne noise.

Keys

Key Matrix

Key Location Purpose
Physical smartcard RSA 4096 only on is-jumper Auth from is-jumper to J1/J2/company network
is-jumper_ed25519 local ~/.ssh/keys/is-jumper_ed25519 Auth from macOS to is-jumper
Modern ED25519 local ~/.ssh/id_ed25519 or ~/.ssh/keys/id_ed25519 Local lab and migrated hosts
Legacy RSA local ~/.ssh/keys/id_rsa_old Temporary migration fallback for old local hosts

Critical config values:

entrypoints:
  is_jumper:
    hostname: 192.168.2.100
    user: root
    identity_file: ~/.ssh/keys/is-jumper_ed25519
    identities_only: true

jumps:
  j1:
    hostname: 10.253.51.50
    user: bogdan.timofte
    port: 25904
  j2:
    hostname: 10.253.51.52
    user: bogdan.timofte
    port: 25904

If J1/J2 use bogdan instead of bogdan.timofte, final host SSH will fail with an error like:

bogdan@10.253.51.50: Permission denied (publickey).
Connection to 192.168.2.100 closed.

Fix that in inventory/hosts-local.yaml, deploy, then verify:

ssh -G j1 | grep -E '^(hostname|user|port) '
tools/deploy-local.sh
ssh porta-sip hostname

Inventory and Generation

The generator reads:

inventory/hosts.yaml
inventory/hosts-local.yaml if it exists

Important: the inventory merge is shallow. Later top-level maps from hosts-local.yaml override upstream maps. This is useful for local lab entries but dangerous for defaults. If hosts-local.yaml changes defaults.jump.user, then local jumps.j1 and jumps.j2 must specify user: bogdan.timofte explicitly.

Generated files:

generated/client.conf      installed as ~/.ssh/config
generated/is-jumper.conf   server-side helper config
generated/j1.conf          server-side final-host config
generated/j2.conf          server-side final-host config

generated/ is ignored by git. Recreate it any time:

python3 tools/generate-configs.py

Deploy local runtime:

tools/deploy-local.sh

Deploy does:

1. run tools/generate-configs.py
2. install generated/client.conf as ~/.ssh/config
3. install scripts/ssh-wrapper.sh as ~/.local/bin/ssh
4. install scripts/scp-wrapper.sh as ~/.local/bin/scp
5. install scripts/sftp-wrapper.sh as ~/.local/bin/sftp
6. remove obsolete ~/.ssh/scripts wrapper copies

It does not touch private keys, authorized_keys, or known_hosts.

Local Shell and Wrappers

For company aliases, ssh must be the wrapper:

which ssh
# /Users/bogdan/.local/bin/ssh

If it shows /usr/bin/ssh, fix shell PATH and reload:

source ~/.zshrc
which ssh

The current shell startup should keep ~/.local/bin first in both interactive and login shells. If editing these files, preserve this behavior:

path=("$HOME/.local/bin" ${path:#"$HOME/.local/bin"})
export PATH

ssh-wrapper.sh uses bash 3.2 compatible array expansion under set -u. Do not replace guarded forms like:

${cmd_args[@]+"${cmd_args[@]}"}

with plain:

"${cmd_args[@]}"

On macOS bash 3.2, empty arrays plus set -u can fail with:

cmd_args[@]: unbound variable

Sync from Upstream

Pull upstream hosts.yaml, apply the local is-jumper key override, validate generation, and deploy if changed:

tools/sync-hosts-from-upstream.sh

Defaults:

UPSTREAM_SSH_TARGET=nextgen@192.168.2.103
UPSTREAM_HOSTS_PATH=/home/nextgen/projects/ssh-infrastructure/inventory/hosts.yaml
LOCAL_IS_JUMPER_IDENTITY_FILE=~/.ssh/keys/is-jumper_ed25519
DEPLOY_AFTER_SYNC=1
FORCE_DEPLOY=0

Useful overrides:

UPSTREAM_HOSTS_FILE=/tmp/hosts.yaml tools/sync-hosts-from-upstream.sh
DEPLOY_AFTER_SYNC=0 tools/sync-hosts-from-upstream.sh
FORCE_DEPLOY=1 tools/sync-hosts-from-upstream.sh
UPSTREAM_SSH_TARGET=user@host tools/sync-hosts-from-upstream.sh

After sync, always check J1 user because the local overlay can override jump defaults:

ssh -G j1 | grep -E '^(hostname|user|port) '

Expected:

user bogdan.timofte
hostname 10.253.51.50
port 25904

Adding or Changing Hosts

For company/Next-Gen hosts:

1. Edit inventory/hosts.yaml or sync it from upstream.
2. Keep local-only corrections in inventory/hosts-local.yaml.
3. Run tools/deploy-local.sh.
4. Verify with ssh -G <alias>.
5. Verify read-only with ssh <alias> hostname.
6. Commit source changes only.

For local lab hosts:

1. Edit inventory/hosts-local.yaml.
2. Run tools/deploy-local.sh.
3. Verify with ssh <alias> hostname.
4. Commit the local overlay change.

Common inventory defaults:

Context User Port
J1/J2 company jump bogdan.timofte 25904 for VPN route
Company final hosts usually bogdan usually 22
Company inherited jump config bogdan.timofte often 24
Local lab hosts usually bogdan usually 22
Cisco/OLT interactive devices inventory-specific 22

For Cisco/OLT/password-interactive devices, set:

auth: password_interactive

The wrapper then avoids forcing BatchMode=yes and disables pubkey auth for that final hop.

Key Migration for Local Legacy Hosts

Modern preferred key:

~/.ssh/id_ed25519.pub

Legacy fallback key:

~/.ssh/keys/id_rsa_old

Migrate all configured local legacy hosts:

tools/migrate-modern-key.sh

Migrate one host:

tools/migrate-modern-key.sh is-baobab

Manual fallback if password access is available:

ssh -o PubkeyAuthentication=no user@host \
  "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys" \
  < ~/.ssh/id_ed25519.pub

Keep id_rsa_old until all legacy hosts are verified with the modern key.

Verification Checklist

Run after deploy, sync, wrapper edits, or inventory changes:

which ssh
ssh -G is-jumper | grep -E '^(hostname|user|identityfile|identitiesonly) '
ssh -G j1 | grep -E '^(hostname|user|port) '
ssh is-jumper hostname
ssh is-jumper 'SSH_AUTH_SOCK=/run/user/0/gnupg/S.gpg-agent.ssh ssh-add -L | sed -n 1p'
ssh porta-sip hostname
ssh pbx-bo hostname

Expected signals:

which ssh                         -> /Users/bogdan/.local/bin/ssh
is-jumper hostname                -> is-vpn-gw
j1 user                           -> bogdan.timofte
physical card check               -> ssh-rsa ... cardno:6446168
porta-sip hostname                -> p12.voip.ro
pbx-bo hostname                   -> pbx-bo

Interactive smoke test:

printf "exit\n" | ssh porta-sip
printf "exit\n" | ssh pbx-bo

Troubleshooting

bogdan@10.253.51.50: Permission denied (publickey)

The wrapper reached is-jumper, but J1 was attempted with user bogdan. J1/J2 need bogdan.timofte.

Check:

ssh -G j1 | grep -E '^(hostname|user|port) '

Fix:

# inventory/hosts-local.yaml
jumps:
  j1:
    user: bogdan.timofte
  j2:
    user: bogdan.timofte

Deploy:

tools/deploy-local.sh
ssh porta-sip hostname

root@192.168.2.100: Permission denied

The local connection to is-jumper is using the wrong key.

Check:

ssh -G is-jumper | grep -E '^(user|hostname|identityfile|identitiesonly) '
ls -l ~/.ssh/keys/is-jumper_ed25519

Expected:

user root
hostname 192.168.2.100
identityfile ~/.ssh/keys/is-jumper_ed25519
identitiesonly yes

If generated config is wrong, fix inventory/hosts-local.yaml or inventory/hosts.yaml, then deploy.

ssh pbx-bo uses /usr/bin/ssh

The wrapper is not first in PATH.

Check:

which ssh

Fix current shell:

source ~/.zshrc

If needed, ensure .zprofile and .zshrc both move ~/.local/bin to the front using zsh path, not a guard that leaves it later in PATH.

cmd_args[@]: unbound variable

This is a bash 3.2 plus set -u empty-array issue in ssh-wrapper.sh.

Use guarded array expansion:

${array[@]+"${array[@]}"}

Do not simplify it.

Physical card missing on is-jumper

Check:

ssh is-jumper 'ls -l /run/user/0/gnupg/S.gpg-agent.ssh'
ssh is-jumper 'SSH_AUTH_SOCK=/run/user/0/gnupg/S.gpg-agent.ssh ssh-add -L | sed -n 1p'

Expected key output contains:

cardno:6446168

If missing, the issue is on is-jumper: gpg-agent, card mount, permissions, or hardware state.

Direct command works but wrapper fails

Compare generated command behavior:

bash -x ~/.local/bin/ssh porta-sip hostname

Look for:

SSH_AUTH_SOCK=/run/user/0/gnupg/S.gpg-agent.ssh
bogdan.timofte@10.253.51.50

If either is wrong, fix inventory/local overlay or wrapper.

Generated config was edited manually

Discard manual runtime edits by redeploying:

tools/deploy-local.sh

Then verify:

ssh -G j1 | grep -E '^(hostname|user|port) '

Compatibility and Compliance

Do not reintroduce these removed patterns:

j1-relay.sh
ssh-proxy.sh
ensure-ssh-agent-bridge.sh
ensure-ssh-jump.sh
local socket forwarding for the hardware card
Python/base64 port-forwarding relays
per-host local ProxyCommand bridges

Current compliant model:

local wrapper -> ssh is-jumper -> run normal ssh from is-jumper

Compatibility options for old final hosts belong in inventory or on jump hosts, not in ad-hoc local forwarding scripts.

Maintenance Notes for Agents

Before changing anything:

git status --short --branch
which ssh
ssh -G j1 | grep -E '^(hostname|user|port) '

When fixing auth:

1. Identify which hop failed from the error user@host.
2. is-jumper failures mean local key/config.
3. J1/J2 failures mean hardware card, SSH_AUTH_SOCK, or jump user.
4. final-host failures mean final host user/auth/port.
5. Apply the fix in inventory or wrapper source, not generated config.
6. Run tools/deploy-local.sh.
7. Run read-only SSH verification.
8. Commit the source change.

Do not assume hosts.yaml alone is the effective config. Always remember inventory/hosts-local.yaml is merged in by tools/generate-configs.py.

Do not trust stale docs, comments, or generated files over these commands:

ssh -G <alias>
tools/deploy-local.sh
ssh <alias> hostname
git diff