I' don't think taht you should visualize this information. All what you could see is that L2 norm is decreased over time(since it is target minimsation function) and accuracy increased. But since F1 is so high I think it is metrics for evaluation on training data.So I would recommend to do Micro P, R, F1: 0.9771 (384/393), 0.9821 (384/391), 0.9796 such report on test data(data wich is not used for training) and create plot of iteration vs F1.
And then you will see when you actually start overfitting data by peak on the plot.
For your own analysis you should plot accuracy vs. time, so you know when you start to overfit. For publication, you can pick the metrics others have reported, so you can compare to them.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.