.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "advanced/dynamic_quantization_tutorial.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_advanced_dynamic_quantization_tutorial.py: (beta) Dynamic Quantization on an LSTM Word Language Model ================================================================== **Author**: `James Reed `_ **Edited by**: `Seth Weidman `_ Introduction ------------ Quantization involves converting the weights and activations of your model from float to int, which can result in smaller model size and faster inference with only a small hit to accuracy. In this tutorial, we will apply the easiest form of quantization - `dynamic quantization `_ - to an LSTM-based next word-prediction model, closely following the `word language model `_ from the PyTorch examples. .. GENERATED FROM PYTHON SOURCE LINES 22-32 .. code-block:: default # imports import os from io import open import time import torch import torch.nn as nn import torch.nn.functional as F .. GENERATED FROM PYTHON SOURCE LINES 33-39 1. Define the model ------------------- Here we define the LSTM model architecture, following the `model `_ from the word language model example. .. GENERATED FROM PYTHON SOURCE LINES 39-73 .. code-block:: default class LSTMModel(nn.Module): """Container module with an encoder, a recurrent module, and a decoder.""" def __init__(self, ntoken, ninp, nhid, nlayers, dropout=0.5): super(LSTMModel, self).__init__() self.drop = nn.Dropout(dropout) self.encoder = nn.Embedding(ntoken, ninp) self.rnn = nn.LSTM(ninp, nhid, nlayers, dropout=dropout) self.decoder = nn.Linear(nhid, ntoken) self.init_weights() self.nhid = nhid self.nlayers = nlayers def init_weights(self): initrange = 0.1 self.encoder.weight.data.uniform_(-initrange, initrange) self.decoder.bias.data.zero_() self.decoder.weight.data.uniform_(-initrange, initrange) def forward(self, input, hidden): emb = self.drop(self.encoder(input)) output, hidden = self.rnn(emb, hidden) output = self.drop(output) decoded = self.decoder(output) return decoded, hidden def init_hidden(self, bsz): weight = next(self.parameters()) return (weight.new_zeros(self.nlayers, bsz, self.nhid), weight.new_zeros(self.nlayers, bsz, self.nhid)) .. GENERATED FROM PYTHON SOURCE LINES 74-82 2. Load in the text data ------------------------ Next, we load the `Wikitext-2 dataset `_ into a `Corpus`, again following the `preprocessing `_ from the word language model example. .. GENERATED FROM PYTHON SOURCE LINES 82-132 .. code-block:: default class Dictionary(object): def __init__(self): self.word2idx = {} self.idx2word = [] def add_word(self, word): if word not in self.word2idx: self.idx2word.append(word) self.word2idx[word] = len(self.idx2word) - 1 return self.word2idx[word] def __len__(self): return len(self.idx2word) class Corpus(object): def __init__(self, path): self.dictionary = Dictionary() self.train = self.tokenize(os.path.join(path, 'train.txt')) self.valid = self.tokenize(os.path.join(path, 'valid.txt')) self.test = self.tokenize(os.path.join(path, 'test.txt')) def tokenize(self, path): """Tokenizes a text file.""" assert os.path.exists(path) # Add words to the dictionary with open(path, 'r', encoding="utf8") as f: for line in f: words = line.split() + [''] for word in words: self.dictionary.add_word(word) # Tokenize file content with open(path, 'r', encoding="utf8") as f: idss = [] for line in f: words = line.split() + [''] ids = [] for word in words: ids.append(self.dictionary.word2idx[word]) idss.append(torch.tensor(ids).type(torch.int64)) ids = torch.cat(idss) return ids model_data_filepath = 'data/' corpus = Corpus(model_data_filepath + 'wikitext-2') .. GENERATED FROM PYTHON SOURCE LINES 133-149 3. Load the pretrained model ----------------------------- This is a tutorial on dynamic quantization, a quantization technique that is applied after a model has been trained. Therefore, we'll simply load some pretrained weights into this model architecture; these weights were obtained by training for five epochs using the default settings in the word language model example. Before running this tutorial, download the required pre-trained model: .. code-block:: bash wget https://s3.amazonaws.com/pytorch-tutorial-assets/word_language_model_quantize.pth Place the downloaded file in the data directory or update the model_data_filepath accordingly. .. GENERATED FROM PYTHON SOURCE LINES 149-170 .. code-block:: default ntokens = len(corpus.dictionary) model = LSTMModel( ntoken = ntokens, ninp = 512, nhid = 256, nlayers = 5, ) model.load_state_dict( torch.load( model_data_filepath + 'word_language_model_quantize.pth', map_location=torch.device('cpu'), weights_only=True ) ) model.eval() print(model) .. rst-class:: sphx-glr-script-out .. code-block:: none LSTMModel( (drop): Dropout(p=0.5, inplace=False) (encoder): Embedding(33278, 512) (rnn): LSTM(512, 256, num_layers=5, dropout=0.5) (decoder): Linear(in_features=256, out_features=33278, bias=True) ) .. GENERATED FROM PYTHON SOURCE LINES 171-174 Now let's generate some text to ensure that the pretrained model is working properly - similarly to before, we follow `here `_ .. GENERATED FROM PYTHON SOURCE LINES 174-199 .. code-block:: default input_ = torch.randint(ntokens, (1, 1), dtype=torch.long) hidden = model.init_hidden(1) temperature = 1.0 num_words = 1000 with open(model_data_filepath + 'out.txt', 'w') as outf: with torch.no_grad(): # no tracking history for i in range(num_words): output, hidden = model(input_, hidden) word_weights = output.squeeze().div(temperature).exp().cpu() word_idx = torch.multinomial(word_weights, 1)[0] input_.fill_(word_idx) word = corpus.dictionary.idx2word[word_idx] outf.write(str(word.encode('utf-8')) + ('\n' if i % 20 == 19 else ' ')) if i % 100 == 0: print('| Generated {}/{} words'.format(i, 1000)) with open(model_data_filepath + 'out.txt', 'r') as outf: all_output = outf.read() print(all_output) .. rst-class:: sphx-glr-script-out .. code-block:: none | Generated 0/1000 words | Generated 100/1000 words | Generated 200/1000 words | Generated 300/1000 words | Generated 400/1000 words | Generated 500/1000 words | Generated 600/1000 words | Generated 700/1000 words | Generated 800/1000 words | Generated 900/1000 words b'(' b'with' b'reinstate' b'@-@' b'coloured' b'MXN' b')' b',' b'as' b'they' b'set' b'Doggett' b'of' b'its' b'own' b'boundaries' b'like' b'general' b'sports' b',' b'coloration' b'against' b'Antarctic' b',' b'Alfa' b',' b'Arab' b',' b'rooted' b'Dollo' b'and' b'John' b'' b'.' b'These' b'governmental' b'convictions' b'are' b'further' b'used' b',' b'and' b'the' b'vision' b'of' b'relayed' b'caused' b'rather' b'much' b'commonly' b'clouds' b'his' b'own' b'repertoire' b'.' b'Eventually' b'with' b'their' b'affection' b'from' b'financial' b'islands' b',' b'he' b'invited' b'a' b'Herodianus' b'with' b'the' b'bag' b'affecting' b',' b'that' b'he' b'does' b'.' b'The' b'central' b'element' b'of' b'brutally' b'is' b'soft' b',' b'but' b'were' b'victorious' b'with' b'over' b'dye' b'US' b'care' b'of' b'European' b'black' b',' b'occasionally' b'whether' b'both' b'most' b'attention' b'of' b'gay' b'@-@' b'lived' b'outside' b'elaborate' b'frequency' b'.' b'' b'The' b'same' b'character' b'is' b'from' b'early' b'1814' b'.' b'Each' b'bird' b'arrives' b'by' b',' b'"' b'King' b'I' b"'m" b'taught' b'all' b',' b'killing' b'you' b'dozen' b'and' b'quite' b'very' b'little' b'known' b'without' b'finishes' b'.' b'"' b'He' b'was' b'reported' b'to' b'be' b'a' b'prime' b'Bok' b'return' b',' b'under' b'A.' b'Boom' b'Hageby' b',' b'' b'po' b'(' b'"' b'The' b'gods' b')' b'"' b',' b'and' b'No' b'Dare' b'to' b'provide' b'his' b'latest' b'effect' b'for' b'the' b'notion' b'to' b'become' b'an' b'planet' b',' b'that' b'you' b'particularly' b'seems' b'.' b'The' b'Elder' b'of' b'then' b'busy' b'connections' b'were' b'"' b'very' b'skeptical' b'of' b'age' b'"' b'so' b',' b'' b'"' b'admired' b'not' b'barbed' b'us' b',' b'"' b'and' b'how' b'"' b'it' b'was' b'truly' b'telling' b'it' b'...' b'the' b'most' b'way' b'may' b'cause' b'their' b'sense' b'of' b'a' b'big' b'Them' b'and' b'not' b'speak' b'"' b'.' b'' b'' b'=' b'=' b'Stadium' b'=' b'=' b'' b'' b'Until' b'place' b',' b'Voices' b'travels' b'to' b'a' b'campus' b'value' b'4' b'@-@' b'year' b',' b'on' b'7' b'September' b'1891' b',' b'and' b'a' b'small' b'' b'in' b'the' b'basement' b'of' b'collaborators' b'.' b'Although' b'they' b'should' b'wait' b'towards' b'translation' b'and' b'they' b'enjoy' b'they' b'do' b'not' b'qualify' b'to' b'run' b',' b'many' b'plays' b'.' b'The' b'Hillsboro' b'Papert' b'is' b'agreeing' b'to' b'restore' b'when' b'the' b'head' b'could' b'followed' b'"' b'Thank' b'"' b',' b'with' b'common' b'books' b'.' b'In' b'real' b'contact' b',' b'minds' b'rice' b',' b'Injected' b'marketing' b',' b',' b'taking' b'certain' b'action' b',' b'cause' b'users' b'of' b'the' b'other' b'' b'projection' b',' b'legal' b'characters' b'departure' b'in' b'one' b'@-@' b'turn' b'.' b'At' b'total' b',' b'pink' b'risk' b'holds' b'determine' b'an' b'supervisor' b'by' b'"' b'' b'properties' b'"' b'.' b'tends' b'as' b'Performances' b',' b'the' b'kakapo' b'for' b'the' b'Viscount' b'it' b'appears' b'not' b'to' b'spar' b'.' b'Although' b'rarely' b',' b',' b'with' b'peptide' b',' b'drifting' b'bodies' b'have' b'few' b'addition' b'.' b'' b',' b'Baibars' b'follows' b'front' b'TLC' b'for' b'realistic' b'to' b'other' b',' b'unlikely' b'but' b'farmland' b'outages' b'apparently' b',' b'a' b'kakapo' b'that' b'can' b'have' b'been' b'dismissed' b'on' b'digital' b'minerals' b'of' b'males' b'.' b'Across' b'a' b'kitsune' b'surface' b'and' b'shrubs' b'also' b'regulate' b'up' b'and' b'the' b'unpleasant' b'Guard' b'between' b'any' b'visions' b'\xe2\x80\x94' b'they' b'pass' b'with' b'one' b'other' b'@-@' b'old' b'zinc' b'effects' b'.' b'The' b'Sun' b'uses' b'100' b'and' b'non' b'@-@' b'thirds' b'Destinations' b',' b'five' b'to' b'its' b'developing' b'last' b'purpose' b'being' b'close' b'.' b'In' b'the' b'fourth' b'overall' b',' b'aware' b'seems' b',' b'Dr.' b'guardian' b'Scott' b'Andrews' b'received' b'their' b'third' b'planet' b'' b'from' b'John' b'galericulata' b'.' b'He' b'found' b'it' b'a' b'challenge' b'on' b'the' b'factory' b',' b'with' b'Aston' b'Wonder' b'Economic' b'Enix' b"'" b'Department' b'.' b'However' b',' b'it' b'would' b'be' b'keen' b'to' b'be' b'on' b'the' b'to' b'produce' b'"' b'' b'"' b'cut' b'by' b'them' b',' b'and' b'translate' b'pitches' b'that' b'suggest' b'novel' b'is' b'a' b'planet' b'that' b'was' b'gradually' b'a' b'player' b'.' b'A' b'large' b'at' b'a' b'place' b'finally' b'worked' b'in' b'reaching' b'odds' b'at' b'Palmyra' b',' b'which' b'was' b'introduced' b'to' b'lists' b'Sun' b'Churchill' b'on' b'a' b'offensive' b'52' b'\xe2\x80\x93' b'8' b'April' b'.' b'' b'' b'was' b'known' b'by' b'boys' b'for' b'Sir' b'Boom' b'Bang' b'!' b'Heylin' b"'s" b'1936' b'book' b'"' b'The' b'Republic' b'@-@' b'third' b'first' b'"' b'.' b'The' b'Bells' b'was' b'absorbed' b'to' b'be' b',' b'and' b'crossed' b'by' b'salvation' b'and' b'tax' b'stories' b',' b'enlarged' b'police' b',' b'mediate' b'played' b'Spain' b',' b'for' b'example' b'\xe2\x80\x93' b'aspiring' b'Detailed' b'%' b'each' b'of' b'the' b'next' b'year' b'.' b'' b'Early' b'other' b'rulers' b'to' b'write' b'American' b'evenly' b'' b',' b'Joyful' b"'s" b'inning' b'describes' b'necklace' b'in' b'pad' b',' b'and' b'critics' b'were' b'responsible' b'for' b'their' b'original' b'grandfather' b',' b'Flower' b'.' b'' b'O' b"'Malley" b'based' b'at' b'full' b'years' b'to' b'well' b'an' b'connection' b'of' b'his' b'career' b'through' b'fighting' b'developed' b'.' b'Hornung' b'shared' b'a' b'long' b'known' b'boat' b',' b'which' b'Jr.' b'called' b'"' b'' b',' b'Was' b',' b'Brill' b'incurred' b'rufipes' b',' b'Forge' b'and' b'chemistry' b'.' b'"' b'However' b',' b'the' b'surviving' b'Treble' b'in' b'idea' b'was' b'initiated' b'.' b'Following' b'evidence' b'in' b'the' b'first' b'game' b',' b'Crash' b'State' b'Bang' b'Technology' b'added' b'that' b'"' b'a' b'Presbyterians' b'are' b'repeatedly' b'surfaces' b'who' b"'ll" b'do' b'Those' b'Numbers' b'.' b'"' b'In' b'2006' b',' b'City' b'invaded' b'shipment' b'a' b'combination' b'of' b'interest' b',' b'which' b'they' b'was' b'recorded' b'in' b'North' b'Africa' b'.' b'The' b'units' b'were' b'a' b'' b'.' b'' b'' b'Jeffries' b'as' b'establishment' b'are' b'the' b'first' b'The' b'' b'lens' b'against' b'those' b'.' b'' b'One' b'following' b'technical' b'vocalists' b'organisations' b'featured' b'TV' b',' b'people' b',' b'and' b'Austin' b'Assembly' b',' b'which' b'are' b'able' b'to' b'address' b'back' b'married' b'further' b'castles' b'long' b'.' b'Both' b'heroes' b'believe' b'that' b'they' b'would' b'have' b'lost' b'water' b',' b'after' b'that' b'that' b'happened' b',' b'observation' b'of' b'teacher' b'\xe2\x80\x94' b'to' b're' b'@-@' b'term' b'churches' b'.' b'' b'The' b'number' b'of' b'pupils' b'and' b'objects' b'were' b'' b'that' b'field' b'men' b'could' b'be' b'cameras' b'.' b'In' b'importance' b'cells' b',' b'clay' b'Cathedral' b'finely' b'' b',' b'laughed' b'as' b'130' b'\xc2\xb0' b'vehicles' b',' b'many' b'scripts' b'have' b'released' b'in' b'abdominal' b'areas' b'in' b'the' b'world' b"'" b'flight' b'project' b',' b'and' b'constructed' b'its' b'first' b'mom' b',' b'Quinn' b'unexpectedly' b',' b'as' b'the' b'' b'' b'bomb' b'as' b'numerous' b'as' b'accumulating' b'down' b',' b'however' b'.' b'The' b'same' b'part' b'has' b'so' b'most' b'high' b'influence' b'with' b'its' b'risks' b'of' b'manuscripts' b'' b'but' b'that' b'a' b'path' b'is' b'dermal' b'but' b'that' b'Tom' b'Hansen' b',' b'though' b'others' b'included' b'' b',' b'' b',' b'and' b'stowaway' b'are' b'degraded' b'to' b'export' b'"' b'not' b'thoroughly' b'mistake' b'from' b'his' b'descendants' b'to' b'become' b'rods' b'again' b'.' b'"' b'At' b'a' b'sixth' b',' b'Ross' b"'s" b'autobiography' b'and' b'tale' b'motion' b'analysis' b'was' b'seaplanes' b',' b'becoming' b'violent' b'desktop' b'hunted' b',' b'taking' b'a' b'total' b'of' b'4' b'@.@' b'9' b'%' b'years' b'but' b'she' b'[' b'they' b'claims' b'another' b'person' b'takes' b'place' b'.' b'' b'The' b'two' b'' b'(' b'Domoina' b')' b'is' b'challenged' b'if' b'laid' b'upon' b'half' b'bird' b'or' b'reading' b',' b'but' b'third' b'characters' b'are' b'Haddock' b'Historic' b'sexpunctatus' b'.' b'A' b'constant' b'word' b'that' .. GENERATED FROM PYTHON SOURCE LINES 200-205 It's no GPT-2, but it looks like the model has started to learn the structure of language! We're almost ready to demonstrate dynamic quantization. We just need to define a few more helper functions: .. GENERATED FROM PYTHON SOURCE LINES 205-250 .. code-block:: default bptt = 25 criterion = nn.CrossEntropyLoss() eval_batch_size = 1 # create test data set def batchify(data, bsz): # Work out how cleanly we can divide the dataset into ``bsz`` parts. nbatch = data.size(0) // bsz # Trim off any extra elements that wouldn't cleanly fit (remainders). data = data.narrow(0, 0, nbatch * bsz) # Evenly divide the data across the ``bsz`` batches. return data.view(bsz, -1).t().contiguous() test_data = batchify(corpus.test, eval_batch_size) # Evaluation functions def get_batch(source, i): seq_len = min(bptt, len(source) - 1 - i) data = source[i:i+seq_len] target = source[i+1:i+1+seq_len].reshape(-1) return data, target def repackage_hidden(h): """Wraps hidden states in new Tensors, to detach them from their history.""" if isinstance(h, torch.Tensor): return h.detach() else: return tuple(repackage_hidden(v) for v in h) def evaluate(model_, data_source): # Turn on evaluation mode which disables dropout. model_.eval() total_loss = 0. hidden = model_.init_hidden(eval_batch_size) with torch.no_grad(): for i in range(0, data_source.size(0) - 1, bptt): data, targets = get_batch(data_source, i) output, hidden = model_(data, hidden) hidden = repackage_hidden(hidden) output_flat = output.view(-1, ntokens) total_loss += len(data) * criterion(output_flat, targets).item() return total_loss / (len(data_source) - 1) .. GENERATED FROM PYTHON SOURCE LINES 251-260 4. Test dynamic quantization ---------------------------- Finally, we can call ``torch.quantization.quantize_dynamic`` on the model! Specifically, - We specify that we want the ``nn.LSTM`` and ``nn.Linear`` modules in our model to be quantized - We specify that we want weights to be converted to ``int8`` values .. GENERATED FROM PYTHON SOURCE LINES 260-268 .. code-block:: default import torch.quantization quantized_model = torch.quantization.quantize_dynamic( model, {nn.LSTM, nn.Linear}, dtype=torch.qint8 ) print(quantized_model) .. rst-class:: sphx-glr-script-out .. code-block:: none LSTMModel( (drop): Dropout(p=0.5, inplace=False) (encoder): Embedding(33278, 512) (rnn): DynamicQuantizedLSTM(512, 256, num_layers=5, dropout=0.5) (decoder): DynamicQuantizedLinear(in_features=256, out_features=33278, dtype=torch.qint8, qscheme=torch.per_tensor_affine) ) .. GENERATED FROM PYTHON SOURCE LINES 269-271 The model looks the same; how has this benefited us? First, we see a significant reduction in model size: .. GENERATED FROM PYTHON SOURCE LINES 271-280 .. code-block:: default def print_size_of_model(model): torch.save(model.state_dict(), "temp.p") print('Size (MB):', os.path.getsize("temp.p")/1e6) os.remove('temp.p') print_size_of_model(model) print_size_of_model(quantized_model) .. rst-class:: sphx-glr-script-out .. code-block:: none Size (MB): 113.944455 Size (MB): 79.738939 .. GENERATED FROM PYTHON SOURCE LINES 281-285 Second, we see faster inference time, with no difference in evaluation loss: Note: we set the number of threads to one for single threaded comparison, since quantized models run single threaded. .. GENERATED FROM PYTHON SOURCE LINES 285-297 .. code-block:: default torch.set_num_threads(1) def time_model_evaluation(model, test_data): s = time.time() loss = evaluate(model, test_data) elapsed = time.time() - s print('''loss: {0:.3f}\nelapsed time (seconds): {1:.1f}'''.format(loss, elapsed)) time_model_evaluation(model, test_data) time_model_evaluation(quantized_model, test_data) .. rst-class:: sphx-glr-script-out .. code-block:: none loss: 5.167 elapsed time (seconds): 199.2 loss: 5.168 elapsed time (seconds): 114.2 .. GENERATED FROM PYTHON SOURCE LINES 298-309 Running this locally on a MacBook Pro, without quantization, inference takes about 200 seconds, and with quantization it takes just about 100 seconds. Conclusion ---------- Dynamic quantization can be an easy way to reduce model size while only having a limited effect on accuracy. Thanks for reading! As always, we welcome any feedback, so please create an issue `here `_ if you have any. .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 5 minutes 22.662 seconds) .. _sphx_glr_download_advanced_dynamic_quantization_tutorial.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: dynamic_quantization_tutorial.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: dynamic_quantization_tutorial.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_