Turns out the reason why my GPU doesn’t speed up my training is because I didn’t batch my data, because the dataset is ragged and I had no idea how to batch it at first
After figuring out how masking works, I got it batching my data. Now I have to figure out how to handle…
…this.