Architecture: the model takes in 128x128x3 images and downsamples it through 3 conv and maxpooling layers before being passed into 3 fully connected layers. Channel size is doubled with every layer. There are two conv layers in the final layer aimed to better capture the smaller features of the playing card. The final fully connected layer has 53 output nodes to tailor to the 53 classes of a deck of playing cards.
Design: I wanted to design a lightweight model that focuses on having fast training speeds. This is mainly accomplished through downsampling with 3 maxpool 2x2 layers.