i trained a crisis PR model and it said 'bitch investigations' once
before IDK-1, before DFD-1, there was NanoPR. a tiny GPT built from scratch in PyTorch. ~630K parameters. character-level tokenizer with a vocab size of 72. three transformer layers. the kind of model you build when you're following the nanoGPT blueprint for the first time.
the dataset was crisis PR scenarios. formal statements, reputation management responses, crisis communication frameworks. i'm a digital PR student. i figured i'd train something in my own domain before jumping to indonesian LLMs.
at some point during generation it produced: 'Bitch Investigations' as part of a formal crisis statement. i don't remember the exact prompt. i don't know what context led to this. the model just decided that was the right move. best val loss was 1.03. i considered this a success.
the actual result was coherent enough if you squint. it understood the structure of formal statements — opening line, body, closing. the words weren't always real words (character-level will do that), but the shape was right. for 630K params trained on 117K characters, that's fine.
what i actually learned: data quality beats data quantity every time. run 6 with 117K chars of homogeneous data (Claude + Perplexity only) outperformed run 3 with 718K chars from 10 different AI sources. the model got confused by stylistic inconsistency, not by lack of data.
NanoPR is done. it lives in ~/Dev/nanopr/ and i don't plan to touch it. it taught me the fundamentals — tokenizer, attention, residual connections, overfitting, temperature sampling. everything i needed to not be completely lost when building IDK-1.
anyway. 'Bitch Investigations.' still my favorite output from any model i've built.