{/* This page is auto-generated from the skill’s SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
Songsee
Audio spectrograms/features (mel, chroma, MFCC) via CLI.
技能元数据
| Source | Bundled (installed by default) |
| Path | skills/media/songsee |
| Version | 1.0.0 |
| Author | community |
| License | MIT |
| Platforms | linux, macos, windows |
| Tags | Audio, Visualization, Spectrogram, Music, Analysis |
参考:完整 SKILL.md
:::info The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active. :::
songsee
从音频文件生成频谱图和多面板音频特征可视化。
前置条件
Requires Go:
go install github.com/steipete/songsee/cmd/songsee@latestOptional: ffmpeg for formats beyond WAV/MP3.
快速入门
# Basic spectrogram
songsee track.mp3
# Save to specific file
songsee track.mp3 -o spectrogram.png
# Multi-panel visualization grid
songsee track.mp3 --viz spectrogram,mel,chroma,hpss,selfsim,loudness,tempogram,mfcc,flux
# Time slice (start at 12.5s, 8s duration)
songsee track.mp3 --start 12.5 --duration 8 -o slice.jpg
# From stdin
cat track.mp3 | songsee - --format png -o out.pngVisualization Types
Use --viz with comma-separated values:
| Type | Description |
|---|---|
spectrogram | Standard frequency spectrogram |
mel | Mel-scaled spectrogram |
chroma | Pitch class distribution |
hpss | Harmonic/percussive separation |
selfsim | Self-similarity matrix |
loudness | Loudness over time |
tempogram | Tempo estimation |
mfcc | Mel-frequency cepstral coefficients |
flux | Spectral flux (onset detection) |
Multiple --viz types render as a grid in a single image.
Common Flags
| Flag | Description |
|---|---|
--viz | Visualization types (comma-separated) |
--style | Color palette: classic, magma, inferno, viridis, gray |
--width / --height | Output image dimensions |
--window / --hop | FFT window and hop size |
--min-freq / --max-freq | Frequency range filter |
--start / --duration | Time slice of the audio |
--format | Output format: jpg or png |
-o | Output file path |
说明
- WAV and MP3 are decoded natively; other formats require
ffmpeg - Output images can be inspected with
vision_analyzefor automated audio analysis - Useful for comparing audio outputs, debugging synthesis, or documenting audio processing pipelines