Expected 3-dimensional input for 3-dimensional weight [512, 512, 5], but got 4-dimensional input of size [1, 0, 1, 512] instead

I have tried multiple approaches but unable to resolve this issue. I have trained my TTS model but after running the inference code I’m facing this issue. I have done implementing reshape and unsqueeze method but nothing worked. The model is a trained tacotron2 model for generating audio samples. Please help me resolve this!!!

(embedding): Embedding(148, 512)
(encoder): Encoder(
(convolutions): ModuleList(
  (0): Sequential(
    (0): ConvNorm(
      (conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
    )
    (1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (1): Sequential(
    (0): ConvNorm(
      (conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
    )
    (1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (2): Sequential(
    (0): ConvNorm(
      (conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
    )
    (1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
)
(lstm): LSTM(512, 256, batch_first=True, bidirectional=True)

Actual Code:

waveglow_path = 
torch.hub.load('nvidia/DeepLearningExamples:torchhub', 
'nvidia_waveglow')
#waveglow = torch.load(waveglow_path, 
map_location=torch.device("cpu")) # waveglow_pt is the path to the 
.pth file
#waveglow = torch.load(waveglow_path)
waveglow = waveglow_path.to('cuda')
waveglow.eval()
#waveglow.cpu().eval()


tacotron2 = torch.hub.load('nvidia/DeepLearningExamples:torchhub', 
'nvidia_tacotron2')
 model_path = '/content/TTS_Nigerian'
 #tacotron2 = torch.load(model_path, 
 map_location=torch.device("cpu"))
 #tacotron2 = torch.load(model_path)
 tacotron2.load_state_dict(torch.load(model_path)['state_dict'])
 print(tacotron2.state_dict)
 tacotron2 = tacotron2.to('cuda')
 tacotron2.eval()


 tk = Tokenizer()
 text = "Now please click on the start button or say 'start' to 
 proceed"
 tk.fit_on_texts(text)
 sequence = np.array(tk.texts_to_sequences([text]))[None, :]
 #sequence = sequence.astype(int)
 sequence = torch.from_numpy(sequence).to(device='cuda', dtype = 
 torch.int64)
 print(sequence.shape)

 torch.squeeze(sequence,[1,3])

with torch.no_grad():
      mel_output, mel_output_postnet, _,alignment = tacotron2.infer(sequence, input_lengths=20)
 audio = waveglow.infer(mel_output_postnet)
 audio_numpy = audio[0].data.cpu().numpy()
 sampling_rate = 22050

 write("audio.wav", sampling_rate, audio_numpy)

 from IPython.display import Audio
 Audio(audio_numpy, rate=sampling_rate)