what i learned building an indonesian slm as a non-cs student
my degree is Digital Public Relations. not Computer Science. not Electrical Engineering. not even Information Systems. PR. the one where you learn how to manage brand reputation and write press releases.
i've trained three models now. NanoPR (~630K params), DFD-1 (500M), IDK-1 (100M, currently training). none of this was required for my degree. none of this will show up in my GPA. i did it because i couldn't not do it.
here's what i actually learned — not from courses, but from things breaking.
lesson 1: you don't need to understand everything to start. i didn't fully understand backpropagation when i trained NanoPR. i didn't understand attention when i started DFD-1. i understood them after — because building gave me the intuition that made the theory click. if i had waited until i 'understood enough', i'd still be waiting.
lesson 2: data quality is the whole game. DFD-1 failed not because of architecture — the architecture was fine. it failed because i used noisy web crawl data. the model learned to repeat garbage because the training data was garbage. i spent weeks on model design and zero time on data cleaning. never again.
lesson 3: iteration speed matters more than scale. 500M sounds better than 100M. it's not, if 500M takes 3 weeks to finish one cycle and 100M takes 1 week. you learn 3x faster with the smaller model. scale up after you know what works.
lesson 4: free compute is enough to start. kaggle gives 30 hours of T4 GPU per week. that's real. IDK-1 is being trained on that. Nala was fine-tuned on that in 30 minutes. the bottleneck is never compute at the beginning — it's always clarity on what you're building and why.
i'm not saying background doesn't matter. it does. i have gaps — math mostly. linear algebra, calculus, the formal stuff. i'm filling them. but the gaps didn't stop me from shipping. they just mean i have more to learn, which is fine. everyone does.