Tugas Fragment Codelab Navigation Pemrograman Mobile

Overview

AndroidTrivia - starter code

Starter code for Android Kotlin Fundamentals codelab 3.1: Create and add a fragment.

Introduction

The AndroidTrivia app asks the user trivia questions about Android development. It makes use of the navigation component within Jetpack to move the user between screens. Each screen is implemented as a fragment.

The app navigates using buttons, the app bar, and a navigation drawer. Because students haven't yet learned about saving data or the Android lifecycle, the app tries to eliminate bugs caused by configuration changes.

Prerequisites

You need to know:

  • The fundamentals of Kotlin.
  • How to create basic Android apps in Kotlin.
  • How to open, build, and run apps with Android Studio.
  • How to work with layouts.

Getting started

  1. Download and run the app.

License

Copyright 2019 Google, Inc.

Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments
  • example of trainable optimizer?

    example of trainable optimizer?

    I suggest a full that implements the optimizer but a trainable step size could be a good example too...

    https://discuss.pytorch.org/t/implement-a-meta-trainable-step-size/70396

    good first issue help wanted 
    opened by renesax14 43
  • Higher leak memory with track_higher_grad = off.

    Higher leak memory with track_higher_grad = off.

    If i get it correctly, with track_higher_grad = off using it to train should be exactly the same as not using it (automatic differentiation-wise), only that imperative update on tensor is replaced with creating new tensor, and moving the reference to the new pointer. However, I found out that it still use up more and more memory, especially on diffopt.step(loss). I think the problem is that pytorch cant distinguish between normal tensor operation and weight update, as they are all functional operation on tensor. Hence, when you do functional update additional graph is still created, forcing old weight to stay in memory. More specifically, maybe you should call detach() before https://github.com/facebookresearch/higher/blob/master/higher/optim.py#L249 ?

    Below is a gist that reproduce the memory leak. https://gist.github.com/MarisaKirisame/e4a48617dbe25eee94f08ab9f7c49d99

    opened by MarisaKirisame 21
  • Question about second-order gradients for GRU

    Question about second-order gradients for GRU

    Hi, I was trying to use the package for obtaining second-order gradients through the optimization process of a model with GRU units each followed by a linear layer. However, when I check torch.autograd.grad(loss, learner_fmodel.parameters(time=0)) I get the error RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.. With allow_unused=True, I see that the gradients with respect to the GRU parameters are None whereas the gradient with respect to the linear layer has values. I was wondering if this is indeed supported for GRU?

    opened by Nithin-Holla 14
  • Improving documentation for copy_initial_weights

    Improving documentation for copy_initial_weights

    I suggest to improve the English used for the documentation in the following:

    copy_initial_weights – if true, the weights of the patched module are copied to form the initial weights of the patched module, and thus are not part of the gradient tape when unrolling the patched module. If this is set to False, the actual module weights will be the initial weights of the patched module. This is useful when doing MAML, for example.

    For example, "the weights of the patched module are copied to form the initial weights of the patched module" doesn't make sense to me because when the context manager is initiated a patched module does not exist yet. So it is unclear what we are copying from and to where (and why copying is something we want to do).

    Also, "unrolling the patched module" does not make sense to me. We usually unroll a computaiton graph caused by a for loop. A patched module is just a neural net that has been modified by this library. Unrolling is ambiguous.

    Also, there isn't a technical definition for "gradient tape".

    Also, when describing what false is, saying that it's useful for MAML isn't actually useful because it doesn't even hint why it's useful for MAML.

    Overall, it's impossible to use the context manager because it's unclear what that flag is suppose to be doing (and seems critical, which is weird it that is has default values).


    Related:

    • gitissue: https://github.com/facebookresearch/higher/issues/30
    • SO: https://stackoverflow.com/questions/60311183/what-does-the-copy-initial-weights-documentation-mean-in-the-higher-library-for
    • What does copy_initial_weights do in the higher library? https://discuss.pytorch.org/t/what-does-copy-initial-weights-do-in-the-higher-library/70384
    • Why does MAML need copy_initial_weights=False? https://discuss.pytorch.org/t/why-does-maml-need-copy-initial-weights-false/70387
    question 
    opened by renesax14 13
  • Memory leak when performing inner loop on a copy

    Memory leak when performing inner loop on a copy

    When running the following script, memory usage seems to continually increase (not all iterations, but many of them in the beginning). Explicitly decrementing the reference counts for the functional model/differentiable optimizer seems to help, but not completely fix the issue. Script for reproducing attached. Monitoring GPU memory usage with watch -n 0.1 nvidia-smi. Running PyTorch version 1.3.0, torchvision 0.4.1, higher version 0.1.3@59537fa. Let me know if any other information would be helpful.

    Code to reproduce the issue:

    import torch, torchvision
    import higher
    import time
    from copy import deepcopy
    
    
    print(torch.__version__, torchvision.__version__)
    
    inner_loop_copy = True # Setting this to False gives no memory usage increase
    
    device = torch.device('cuda:0')
    model = torchvision.models.resnet18().to(device)
    opt = torch.optim.SGD(model.parameters(), lr=1e-5)
    
    for idx in range(100):
        print(idx)
        if inner_loop_copy:
            model_ = deepcopy(model)
            opt_ = torch.optim.SGD(model_.parameters(), lr=1e-5)
        else:
            model_ = model
            opt_ = opt
    
        with higher.innerloop_ctx(model_, opt_) as (fm, do):
            pass
    
        # del fm, do # Uncommenting this helps, but doesn't completely eliminate the memory usage increase
    
    bug help wanted 
    opened by eric-mitchell 9
  • Why not accumulate loss and then take derivative in MAML?

    Why not accumulate loss and then take derivative in MAML?

    Why do you not do this:

    def inner_loop2():
        n_inner_iter = 5
        inner_opt = torch.optim.SGD(net.parameters(), lr=1e-1)
    
        qry_losses = []
        qry_accs = []
        meta_opt.zero_grad()
        meta_loss = 0
        for i in range(task_num):
            with higher.innerloop_ctx(
                net, inner_opt, copy_initial_weights=False
            ) as (fnet, diffopt):
                # Optimize the likelihood of the support set by taking
                # gradient steps w.r.t. the model's parameters.
                # This adapts the model's meta-parameters to the task.
                # higher is able to automatically keep copies of
                # your network's parameters as they are being updated.
                for _ in range(n_inner_iter):
                    spt_logits = fnet(x_spt[i])
                    spt_loss = F.cross_entropy(spt_logits, y_spt[i])
                    diffopt.step(spt_loss)
    
                # The final set of adapted parameters will induce some
                # final loss and accuracy on the query dataset.
                # These will be used to update the model's meta-parameters.
                qry_logits = fnet(x_qry[i])
                qry_loss = F.cross_entropy(qry_logits, y_qry[i])
                qry_losses.append(qry_loss.detach())
                qry_acc = (qry_logits.argmax(
                    dim=1) == y_qry[i]).sum().item() / querysz
                qry_accs.append(qry_acc)
    
                # Update the model's meta-parameters to optimize the query
                # losses across all of the tasks sampled in this batch.
                # This unrolls through the gradient steps.
                #qry_loss.backward()
                meta_loss += qry_loss
    
        qry_losses = sum(qry_losses) / task_num
        qry_losses.backward()
        meta_opt.step()
        qry_accs = 100. * sum(qry_accs) / task_num
        i = epoch + float(batch_idx) / n_train_iter
        iter_time = time.time() - start_time
    

    instead of what you have:

    def inner_loop1():
        n_inner_iter = 5
        inner_opt = torch.optim.SGD(net.parameters(), lr=1e-1)
    
        qry_losses = []
        qry_accs = []
        meta_opt.zero_grad()
        for i in range(task_num):
            with higher.innerloop_ctx(
                net, inner_opt, copy_initial_weights=False
            ) as (fnet, diffopt):
                # Optimize the likelihood of the support set by taking
                # gradient steps w.r.t. the model's parameters.
                # This adapts the model's meta-parameters to the task.
                # higher is able to automatically keep copies of
                # your network's parameters as they are being updated.
                for _ in range(n_inner_iter):
                    spt_logits = fnet(x_spt[i])
                    spt_loss = F.cross_entropy(spt_logits, y_spt[i])
                    diffopt.step(spt_loss)
    
                # The final set of adapted parameters will induce some
                # final loss and accuracy on the query dataset.
                # These will be used to update the model's meta-parameters.
                qry_logits = fnet(x_qry[i])
                qry_loss = F.cross_entropy(qry_logits, y_qry[i])
                qry_losses.append(qry_loss.detach())
                qry_acc = (qry_logits.argmax(
                    dim=1) == y_qry[i]).sum().item() / querysz
                qry_accs.append(qry_acc)
    
                # Update the model's meta-parameters to optimize the query
                # losses across all of the tasks sampled in this batch.
                # This unrolls through the gradient steps.
                qry_loss.backward()
    
        meta_opt.step()
        qry_losses = sum(qry_losses) / task_num
        qry_accs = 100. * sum(qry_accs) / task_num
        i = epoch + float(batch_idx) / n_train_iter
        iter_time = time.time() - start_time
    

    https://github.com/facebookresearch/higher/blob/e45c1a059e39a16fa016d37bc15397824c65547c/examples/maml-omniglot.py#L130


    https://stackoverflow.com/questions/62394411/why-not-accumulate-query-loss-and-then-take-derivative-in-maml-with-pytorch-and

    question 
    opened by renesax14 8
  • Computational graph not retained for BERT

    Computational graph not retained for BERT

    I'm trying to implement first-order version of ProtoMAML (https://arxiv.org/pdf/1903.03096.pdf) for a sequence labelling task. If I use BERT as encoder, I run into this error at the line diffopt.step: RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.. Instead, if I use an LSTM as an encoder, then it runs successfully. Perhaps the graph is purged somehow since BERT is a large model?

    Here is a self-contained code to replicate the issue. The issue occurs on both CPU and GPU. On line 117, you can specify the encoder as bert or lstm. It requires the transformers library from HuggingFace to run.

    import higher
    import torch
    
    from torch import nn, optim
    from transformers import BertModel, BertTokenizer
    from torch.nn import functional as F
    
    
    class BaseModel(nn.Module):
    
        def __init__(self, encoder, max_length, device):
            super(BaseModel, self).__init__()
            self.max_length = max_length
            self.device = device
            if encoder == 'bert':
                self.tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
                self.encoder = BertModel.from_pretrained('bert-base-uncased')
                self.encoder.pooler.dense.weight.requires_grad = False
                self.encoder.pooler.dense.bias.requires_grad = False
            elif encoder == 'lstm':
                self.encoder = nn.LSTM(batch_first=True, input_size=32, hidden_size=768)
            self.linear = nn.Linear(768, 192)
            self.to(self.device)
    
        def encode_text(self, text):
            if isinstance(self.encoder, BertModel):
                encode_result = self.tokenizer.batch_encode_plus(text, return_token_type_ids=False, max_length=self.max_length,
                                                                 pad_to_max_length=True, return_tensors='pt')
                for key in encode_result:
                    encode_result[key] = encode_result[key].to(self.device)
                return encode_result
            elif isinstance(self.encoder, nn.LSTM):
                return torch.randn((len(text), 32, 32), device=self.device)
    
        def forward(self, inputs):
            if isinstance(self.encoder, BertModel):
                out, _ = self.encoder(inputs['input_ids'], attention_mask=inputs['attention_mask'])
            elif isinstance(self.encoder, nn.LSTM):
                out, _ = self.encoder(inputs)
            out = out[:, 1:-1, :]
            out = self.linear(out)
            return out
    
    
    class ProtoMAML:
    
        def __init__(self, device, encoder):
            self.output_layer_weight = None
            self.output_layer_bias = None
            self.learner = BaseModel(encoder=encoder, max_length=32, device=device)
            self.inner_optimizer = optim.SGD([p for p in self.learner.parameters() if p.requires_grad], lr=0.001)
            self.loss_fn = nn.CrossEntropyLoss()
            self.output_lr = 0.001
            self.device = device
            self.updates = 5
    
        def output_layer(self, input, weight, bias):
            return F.linear(input, self.output_layer_weight + weight, self.output_layer_bias + bias)
    
        def initialize_with_proto_weights(self, support_repr, support_label, n_classes):
            prototypes = self.build_prototypes(support_repr, support_label, n_classes)
            weight = 2 * prototypes
            bias = -torch.norm(prototypes, dim=1) ** 2
            self.output_layer_weight = torch.zeros_like(weight, requires_grad=True)
            self.output_layer_bias = torch.zeros_like(bias, requires_grad=True)
            return weight, bias
    
        def build_prototypes(self, data_repr, data_label, num_outputs):
            n_dim = data_repr.shape[2]
            data_repr = data_repr.view(-1, n_dim)
            data_label = data_label.view(-1)
    
            prototypes = torch.zeros((num_outputs, n_dim), device=self.device)
    
            for c in range(num_outputs):
                idx = torch.nonzero(data_label == c).view(-1)
                if idx.nelement() != 0:
                    prototypes[c] = torch.mean(data_repr[idx], dim=0)
    
            return prototypes
    
        def initialize_output_layer(self, n_classes):
            self.output_layer_weight = torch.randn((n_classes, 768), requires_grad=True)
            self.output_layer_bias = torch.randn(n_classes, requires_grad=True)
    
        def train(self, support_text, labels, n_classes, n_iter):
    
            for itr in range(n_iter):
                print('Iteration ', itr)
    
                self.learner.zero_grad()
    
                self.initialize_output_layer(n_classes)
                x = self.learner.encode_text(support_text)
                y = labels.to(device)
                output_repr = self.learner(x)
                init_weights, init_bias = self.initialize_with_proto_weights(output_repr, y, n_classes)
    
                with higher.innerloop_ctx(self.learner, self.inner_optimizer,
                                          copy_initial_weights=False,
                                          track_higher_grads=False) as (flearner, diffopt):
    
                    for i in range(self.updates):
                        output = flearner(x)
                        output = self.output_layer(output, init_weights, init_bias)
                        output = output.view(output.size()[0] * output.size()[1], -1)
                        loss = self.loss_fn(output, y)
                        output_weight_grad, output_bias_grad = torch.autograd.grad(loss, [self.output_layer_weight, self.output_layer_bias],
                                                                                   retain_graph=True)
                        self.output_layer_weight = self.output_layer_weight - self.output_lr * output_weight_grad
                        self.output_layer_bias = self.output_layer_bias - self.output_lr * output_bias_grad
                        diffopt.step(loss)
    
    
    if __name__ == '__main__':
        
        encoder = 'bert'  # or 'lstm'
    
        support_text = [['This is a support text']] * 64
        labels = torch.randint(0, 10, (64 * 30, ))
        n_classes = 10
        device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        model = ProtoMAML(device=device, encoder=encoder)
        model.train(support_text, labels, n_classes, n_iter=10)
    
    opened by Nithin-Holla 8
  • _cudnn_rnn_backward is not implemented

    _cudnn_rnn_backward is not implemented

    Hi, I have a following error in my code, however I am using torch 1.3 . RuntimeError: derivative for _cudnn_rnn_backward is not implemented I know that it is pytorch related error. I am wondering in which version of pytorch it has been resolved!

    Solved by using following code : with torch.backends.cudnn.flags(enabled=False):

    opened by nooralahzadeh 8
  • Question about a module that does not required grad

    Question about a module that does not required grad

    Thanks for the wonderful library!

    I have one question about how to use higher.

    I want to train the network with second order derivative,

    but some parts of network (such as nn.Embedding ) are frozen.

    something like this

    net= Net()
    param = filter(lambda x: x.requires_grad, model.parameters())
    inner_opt = torch.optim.SGD(param, lr=1e-1)
    
    with higher.innerloop_ctx(net, inner_opt, copy_initial_weights=False) as (fnet, diffopt):
      logits = net(batch_x)
      loss = loss_fnt(logits, batch_y)
      diffopt.step(loss)
                  
    

    but i got the error message "RuntimeError: One of the differentiated Tensors does not require grad"

    bug 
    opened by seanie12 7
  • Weights of monkeypatched module get reset by a forward pass after being changed manually

    Weights of monkeypatched module get reset by a forward pass after being changed manually

    Any computed update to the weights of a monkeypatched module (not with a differentiable optimizer) doesn't behave as expected:

    data = torch.randn(32,10)  
    module = torch.nn.Linear(10,10)  
    fmodel = higher.monkeypatch(module,copy_initial_weights=True)  
    fmodel.UpdateWeights() # some update to the weights, for a simple example: fmodel.weight = fmodel.weight * 2.
    fmodel.forward(data) # weights are now reset as if fmodel.UpdateWeights() never happened
    

    Edit: I've noticed that fmodel.named_parameters() and fmodel._parameters contain the correct updated parameters whereas fmodel.parameters() and fmodel.fast_params contain the old parameter values

    bug 
    opened by Horse7354 6
  • More complete example please?

    More complete example please?

    Hi, dont suppose.... could it be possible to have a more complete, yet still very simple, example of how to use higher please?

    • ~like how would we write MAML using higher? (I'm aware that some of the linked papers use MAML, but the implementation tends to be buried in a lot of paper-specific things, e.g. https://github.com/facebookresearch/LearningToLearn/blob/main/ml3/ml3_train.py#L17-L34 has a bunch of abstract functions etc, without descriptions provided of what they are, so one has to go hunting through the code to figure out what they do, etc. Similarly for https://github.com/Nithin-Holla/MetaWSD/blob/master/models/seq_meta.py#L120-L143~ Ooohh, there is a maml example at https://github.com/facebookresearch/higher/blob/master/examples/maml-omniglot.py, and I've created a PR to add a link to this in the readme https://github.com/facebookresearch/higher/pull/109
    • if one has a loss function that takes as input parameters onto which one backpropped a few times, how would such an example look, in the general case?
      • like eg lets say one had neuralstyle, https://github.com/jcjohnson/neural-style , and one wanted to create a loss function based on the generated image, how would one do that using higher?

    PS super awesome repo :)

    opened by hughperkins 3
  • Is there data leakage in the maml-omniglot example?

    Is there data leakage in the maml-omniglot example?

    In the maml-omniglot.py example code, net.train() is used for meta-test phases (link).

    Does this not cause data leakage of meta-test data via the statistics of nn.BatchNorm2d (net contains several nn.BatchNorm2d)?

    opened by SunHaozhe 0
  • Tracking higher-order grads when arbitrarily combining submodules of functional module

    Tracking higher-order grads when arbitrarily combining submodules of functional module

    Hi, I'm implementing a bilevel optimization procedure to meta-learn an embedding and am struggling to find the best way to track higher-order gradients for all of my learned modules. The model in my full pipeline involves several torch modules which are combined in different ways depending on where in the pipeline the forward pass happens. I would like to meta-learn an embedding (optimized in the outer loop, possibly along with some of the model parameters), and optimize some model parameters in the inner loop (to produce "fast weights") where the inner loss is partially parameterized by this meta-learned embedding.

    Currently, to fit into the higher framework, I'm wrapping all of the modules for my model (in addition to the learned embedding) in a super-module, with each module as a submodule. I can then pass this super-module to a higher.innerloop_ctx to track higher order grads for all params of all of these modules. Ideally, I'd like for each of the submodules to be functional as well, so that I can combine modules arbitrarily by referring to them as fmodel.submodule1(...) and have higher order gradients computed in the backward pass. However, it seems like this is not supported by higher, and that the only way to do forward passes inside an innerloop_ctx is by calling the fmodel's forward function. This is undesirable in my use case, since it would require the forward function of the super-module to be very bloated with different conditions to account for all of the cases in which it could be called.

    My questions are as follows:

    1. Is there a way to make each of my submodules functional so that they can be combined arbitrarily and still track higher order gradients?
    2. Do I need to be using this super-module at all? Is it possible/simple to track higher order grads for several modules at once (allowing me to decouple my learned embedding from the rest of the model, for example)?

    Below is a minimal runnable example to illustrate the issue I'm facing. The inner and outer losses in the example are arbitrary and just provide something to optimize. The lines of interest are marked with # NOTE: comments. The approaches using the forward method of fmodel (the super-module) produce the desired results, the others do not.

    Essentially what I want is for fmodel.decoder(fmodel.encoder(frame)) to behave the same way as fmodel(frame) in the example below — is this possible with higher?

    import torch
    import torch.optim as optim
    import torch.nn as nn
    
    import higher
    
    device = torch.device('cpu')
    inner_lr = outer_lr = .001
    channels = 3
    batch_size = 7
    
    # random dataloader for debugging
    def batch_generator():
        while True:
            yield torch.rand((batch_size, channels, 64, 64), device=device)
    dataloader = batch_generator()
    
    
    class BasicModel(nn.Module):
        def __init__(self, nc, emb_dim):
            super().__init__()
            self.encoder = nn.Sequential(
                nn.Conv2d(nc, 16, 3, stride=2, padding=1),
                nn.ReLU(),
                nn.Conv2d(16, 32, 3, stride=2, padding=1),
                nn.ReLU(),
                nn.Conv2d(32, 64, 7)
            )
            self.decoder = nn.Sequential(
                nn.ConvTranspose2d(64, 32, 7),
                nn.ReLU(),
                nn.ConvTranspose2d(32, 16, 3, stride=2, padding=1, output_padding=1),
                nn.ReLU(),
                nn.ConvTranspose2d(16, nc, 3, stride=2, padding=1, output_padding=1),
                nn.Sigmoid()
            )
            self.emb = nn.Sequential(
                nn.Flatten(),
                nn.Linear(64 * 64 * nc, emb_dim)
            )
    
        def f(self, x):
            x = self.encoder(x)
            x = self.decoder(x)
            return x
        
        def forward(self, x):
            x = self.encoder(x)
            x = self.decoder(x)
            return x
        
    
    basic_model = BasicModel(channels, emb_dim=5).to(device)
    outer_opt = optim.Adam(basic_model.emb.parameters(), lr=outer_lr)
    outer_mse_crit = nn.MSELoss()
    
    
    for epoch in range(2):  # arbitrary number of epochs
        for batch_num in range(2):  # arbitrary number of outer steps
            frame = next(dataloader)
    
            inner_crit = nn.MSELoss()
            inner_opt = optim.SGD(list(basic_model.encoder.parameters()) +
                                  list(basic_model.decoder.parameters()), lr=inner_lr)
            
            with higher.innerloop_ctx(
                basic_model,
                inner_opt,
                track_higher_grads=True,
                copy_initial_weights=False,
            ) as (fmodel, diffopt):
                for inner_step in range(2):  # arbitrary number of inner steps
                    
                    # NOTE: These two ways of computing a forward pass behave as desired,
                    #       producing a gradient tape from `outer_loss` to emb params
                    # gen_frame = fmodel(frame)
                    # gen_frame = fmodel.forward(frame)
                    
                    # NOTE: There is no gradient path from `outer_loss` to embedding params
                    #       when forward pass is computed in either of the following ways:
                    gen_frame = fmodel.decoder(fmodel.encoder(frame))
                    # seems like only the `forward` method gets special attention
                    # gen_frame = fmodel.f(frame)
    
                    embedding = fmodel.emb(gen_frame)
                    # arbitrary simple stand-in loss to reproduce behavior
                    inner_loss = inner_crit(torch.zeros_like(embedding), embedding)
                    diffopt.step(inner_loss)
    
                # NOTE: These two ways of computing a forward pass behave as desired,
                #       producing a gradient tape from `outer_loss` to emb params
                # final_gen_frame = fmodel(frame)
                # final_gen_frame = fmodel.forward(frame)
                
                # NOTE: There is no gradient path from `outer_loss` to embedding params
                #       when forward pass is computed in either of the following ways:
                final_gen_frame = fmodel.decoder(fmodel.encoder(frame))
                # seems like only the `forward` method gets special attention
                # final_gen_frame = fmodel.f(frame)
    
            # arbitrary simple stand-in loss to reproduce behavior
            outer_loss = outer_mse_crit(torch.ones_like(final_gen_frame), final_gen_frame)
    
            # print gradients of `outer_loss` w.r.t. embedding parameters;
            #        `None` indicates no path
            dl_demb = torch.autograd.grad(outer_loss, basic_model.emb.parameters(),
                                          allow_unused=True, retain_graph=1)
            print(f'd(outer_loss)/d(emb): {dl_demb}')
    
            outer_opt.zero_grad()
            outer_loss.backward()
            outer_opt.step()
    
    opened by dylandoblar 0
  • Memory not freed when moving out of scope?

    Memory not freed when moving out of scope?

    Hi, thanks so much for providing this library!

    I am implementing a multi-step optimization problem where I am using two models (visual_encoder resnet and a coefficient_vector) to calculate a weighted training loss. Backpropagating this loss leads to an update of my main model weights. Then, using validation loss I'd like to update my visual_encoder resnet and coefficient_vector parameters with higher.

    The following function returns the gradients for those two models so that I can subsequently update them in another function.

    However, when calling logits = fmodel(input) there's very high memory allocation that is never freed after returning the gradients from the function. Is there anything I am doing wrong here? Which reference is kept that I am missing? Any hint is highly appreciated and my apologies if this is not the right place to ask for this.

        with higher.innerloop_ctx(model, optimizer) as (fmodel, foptimizer):
            
            logits = fmodel(input)# heavy mempry allocation here which is never freed
            weights = calc_instance_weights(input, target, input_val, target_val, logits, coefficient_vector, visual_encoder)#this returns a weight for each training sample (input)
            weighted_training_loss = torch.mean(weights * F.cross_entropy(logits, target, reduction='none'))
            foptimizer.step(weighted_training_loss) #update fmodel main model weights
    
            logits = fmodel(input)
            meta_val_loss = F.cross_entropy(logits, target)
    
            coeff_vector_gradients = torch.autograd.grad(meta_val_loss, coefficient_vector, retain_graph=True) # get the gradients w.r.t. coefficient vector
            coeff_vector_gradients = coeff_vector_gradients[0].detach()
            visual_encoder_gradients = torch.autograd.grad(meta_val_loss,
                                                               visual_encoder.parameters())# get the gradients w.r.t. resnet parameters
            visual_encoder_gradients = (visual_encoder_gradients[0].detach(), visual_encoder_gradients[1].detach())
    
            return visual_encoder_gradients, coeff_vector_gradients
    

    Thanks a lot!

    opened by jessicamecht 0
  • when do we divide by met_batch_size?

    when do we divide by met_batch_size?

    which one is the correct one to be doing:

                # Accumulate gradients wrt meta-params for each task
                qry_loss_t.backward()  # note this is more memory efficient (as it removes intermediate data that used to be needed since backward has already been called)
                # (qry_loss_t / meta_batch_size).backward()  # note this is more memory efficient (as it removes intermediate data that used to be needed since backward has already been called)
    

    ref: https://stackoverflow.com/questions/66606540/when-does-one-divide-by-the-meta-batch-size-for-maml-during-meta-learning

    opened by brando90 0
  • The fmodel can't transport the running_mean and running_var of BN layer to the original model

    The fmodel can't transport the running_mean and running_var of BN layer to the original model

    The fmodel can't transport the running_mean and running_var to the original model, which may cause that the loss explosion when the model in meta_test dataset, please fix it or give me some advice on how to address it gracefully? Thanks a lot for that!

    The sample code and related results are as follows:

        for i_task in range(args.n_batch):
            
            x_spti, x_qryi = torch.from_numpy(x_spt[i_task]).float().to(device), torch.from_numpy(x_qry[i_task]).float().to(device)
    
            fmodel = higher.patch.monkeypatch(model, device=device, copy_initial_weights=False, track_higher_grads=True)
    
            support_preds = fmodel(x_spti)
            support_loss = loss(support_preds, x_spti)
            grad = torch.autograd.grad(support_loss, fmodel.fast_params, create_graph=False,retain_graph=True,allow_unused=True)
            fast_weights = list(map(lambda p: p[1] -adapt_lr * p[0], zip(grad, fmodel.fast_params)))
    
            for step in range(1,args.adapt_steps):
                fmodel.fast_params = fast_weights
                support_preds = fmodel(x_spti)
                support_loss = loss(support_preds, x_spti)
    
                grad = torch.autograd.grad(support_loss, fmodel.fast_params, create_graph=False,retain_graph=True,allow_unused=True)
    
                fast_weights = list(map(lambda p: p[1] - adapt_lr * p[0], zip(grad, fmodel.fast_params)))
            fmodel.fast_params = fast_weights
            x_hat_qry = fmodel(x_qryi)
            iter_error_temp = loss(x_hat_qry,x_qryi)
            iter_error = iter_error + iter_error_temp
        iter_error = iter_error/args.n_batch
        iter_error.backward()
        optimizer.step()
    

    ` 1 related result

    In [1] fmodel.state_dict()['encoder.base1.conv1.bn.running_mean'] Out[1] tensor([ 0.0519, -0.1342, -0.1098, -0.0754, -0.0366, -0.0326, -0.0846, -0.0088, 0.0524, -0.0084, -0.0287, 0.0098, 0.0276, 0.0137, -0.0608, 0.0782, -0.0773, 0.1453, -0.0556, 0.0077, -0.0784, 0.1525, -0.0749, -0.1261, -0.0518, 0.0650, 0.0663, -0.0138, -0.0267, 0.0617, -0.0196, 0.0009], device='cuda:0') In [2] model.state_dict()['encoder.base1.conv1.bn.running_mean'] Out [2] tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0')

    the final results comparison between with and without running_mean and running_var func2

    The orange line means the model trianed assisted by higher Lib, and the blue line means the model trained assisted by another Lib. Since using the same random seed, the both models' parameters are all the same except running_mean and running_var. It can be seen that at the very beginning, there was an obvious loss explosion. I don't know how to address it gracefully besides pass parameters layer by layer manually.

    opened by ensiwalk 0
  • Why DifferentiableOptimizer detaches parameters when track_higher_grads = False?

    Why DifferentiableOptimizer detaches parameters when track_higher_grads = False?

    Hi! Thank you for this awesome library, it helps me a lot.

    I am not sure whether I'm missing something, but I'm confused about why DifferentiableOptimizer detaches parameters when track_higher_grads = False:

    https://github.com/facebookresearch/higher/blob/1e20cf9696054277b2d760f64835d5d74a3115a2/higher/optim.py#L251-L257

    which cuts the gradient path back to the original model parameters, even though copy_initial_weights=False. When we set copy_initial_weights=False, we want to allow gradients flow back to the original model parameters, but line 257 cut off the gradient flow.

    In my use case, I want to implement something like FOMAML and here is a simplify version of my code:

    def inner_loop(self, fmodel, diffopt, train_input, train_target):
        # ...
    
    def outer_loop(self, task_batch):
        self.out_optim.zero_grad()
    
        for task_data in task_batch:
            support_input, support_target, query_input, query_target = task_data
    
            with higher.innerloop_ctx(
                self.model, self.in_optim, copy_initial_weights=False, track_higher_grads=False
            ) as (fmodel, diffopt):
                self.inner_loop(fmodel, diffopt, support_input, support_target)
    
                query_output = fmodel(query_input)
                query_loss = F.cross_entropy(query_output, query_target)
                query_loss.backward()
    
        for param in self.model.parameters():
            print(param.grad)  # output: None
        self.out_optim.step()
    

    The gradients were not propagated back to the original parameters. My code works well after I edit the code of higher to:

    new_params = params[:]
    for group, mapping in zip(self.param_groups, self._group_to_param_list):
        for p, index in zip(group['params'], mapping):
            new_params[index] = p
    

    I know this problem can be solved by manully mapping the gradients, but I just wonder why detaching parameters is necessary here. And thank you for your nice work again!

    opened by Renovamen 1
  • Is the higher library compatible with pytorch's distributed RPC?

    Is the higher library compatible with pytorch's distributed RPC?

    ref: https://pytorch.org/tutorials/intermediate/rpc_tutorial.html

    motivation: https://discuss.pytorch.org/t/how-to-parallelize-a-loop-over-the-samples-of-a-batch/32698/19

    related: https://github.com/tristandeleu/pytorch-meta/issues/116

    wontfix 
    opened by brando90 6
  • How to evaluate model without gpu memory issues?

    How to evaluate model without gpu memory issues?

    I am using the MAML example as a template but I am additionally computing on a GPU. Normally I would use torch.no_grad to do model evaluation without overflowing GPU memory -- I obviously cannot do that with meta-learning! What is standard practice for allowing model eval during training?

    opened by njwfish 0
Owner
null
AndroidBriefActions - Android library for sending and observing non persistent actions such as showing a message; nice readable way to call navigation actions from ViewModel or Activity/Fragment.

implementation "com.vladmarkovic.briefactions:briefactions:$briefActionsVersion" Benefits Why use brief-actions library pattern: Prevent short-term ac

null 2 Dec 22, 2022
Android Navigation Fragment Share Element Example: Use Share Element Transition with recyclerView Item and ViewPager2 Item.

Android-Navigation-Fragment-Share-Element-Example 说明 Android 使用Navigation导航切换Fragment中使用共享元素过渡动画的例子:将在listFragment的RecyclerView的Item共享元素过渡到pagerFragme

null 3 Sep 28, 2022
Memory efficient android library for managing individual fragment backstack.

fragstack : Android library for managing individual fragment backstack. An Easy to use library for managing individual fragment back stack as Instagra

Abhishesh 21 Feb 6, 2021
Use Fragment like Activity

Fragivity : Use Fragment like Activity English | 中文文档 Fragivity is a library used to build APP with "Single Activity + Multi-Fragments" Architecture R

fundroid 278 Nov 22, 2022
Alligator is a modern Android navigation library that will help to organize your navigation code in clean and testable way.

Alligator Alligator is a modern Android navigation library that will help to organize your navigation code in clean and testable way. Features Any app

Artur Artikov 290 Dec 9, 2022
Android multi-module navigation built on top of Jetpack Navigation Compose

MultiNavCompose Android library for multi-module navigation built on top of Jetpack Navigation Compose. The goal of this library is to simplify the se

Jeziel Lago 21 Dec 10, 2022
DSC Moi University session on using Navigation components to simplify creating navigation flow in our apps to use best practices recommended by the Google Android Team

Navigation Components Navigate between destination using safe args How to use the navigation graph and editor How send data between destinations Demo

Breens Mbaka 6 Feb 3, 2022
[ACTIVE] Simple Stack, a backstack library / navigation framework for simpler navigation and state management (for fragments, views, or whatevers).

Simple Stack Why do I want this? To make navigation to another screen as simple as backstack.goTo(SomeScreen()), and going back as simple as backstack

Gabor Varadi 1.3k Jan 2, 2023
Navigation Component: THE BEST WAY to create navigation flows for your app

LIVE #017 - Navigation Component: A MELHOR FORMA de criar fluxos de navegação para o seu app! Código fonte do projeto criado na live #017, ensinando c

Kaique Ocanha 4 Jun 15, 2022
New style for app design simple bottom navigation with side navigation drawer UI made in Jetpack Compose.😉😎

BottomNavWithSideDrawer New style for app design simple bottom navigtaion with side navigation drawer UI made in Jetpack Compose. ?? ?? (Navigation Co

Arvind Meshram 5 Nov 24, 2022
Navigation Drawer Bottom Navigation View

LIVE #019 - Toolbar, Navigation Drawer e BottomNavigationView com Navigation Com

Kaique Ocanha 6 Jun 15, 2022
Bottom-App-Bar-with-Bottom-Navigation-in-Jetpack-compose-Android - Bottom App Bar with Bottom Navigation in Jetpack compose

Bottom-App-Bar-with-Bottom-Navigation-in-Jetpack-compose-Android This is simple

Shruti Patel 1 Jul 11, 2022
Animated Tab Bar is an awesome navigation extension that you can use to add cool, animated and fully customizable tab navigation in your apps

Animated Tab Bar is an awesome navigation extension that you can use to add cool, animated and fully customizable tab navigation in your apps. The extension provides handy methods and properties to change the behaviour as well as the appearance of the navigation bar.

Zain Ul Hassan 4 Nov 30, 2022
A simple Floating Action Button that shows an anchored Navigation View

Floating Navigation View A simple Floating Action Button that shows an anchored Navigation View and was inspired by Menu Material Fixed created by Tom

André Mion 1.3k Dec 29, 2022
🎉 [Android Library] A light-weight library to easily make beautiful Navigation Bar with ton of 🎨 customization option.

Bubble Navigation ?? A light-weight library to easily make beautiful Navigation Bars with a ton of ?? customization options. Demos FloatingTopBarActiv

Gaurav Kumar 1.7k Dec 31, 2022
A customizable and easy to use BottomBar navigation view with sleek animations, with support for ViewPager, ViewPager2, NavController, and badges.

AnimatedBottomBar A customizable and easy to use bottom bar view with sleek animations. Examples Playground app Download the playground app from Googl

Joery 1.2k Dec 30, 2022
An android navigation bar widget

Chip Navigation Bar A navigation bar widget inspired on Google Bottom Navigation mixed with Chips component. Usage <!-- bottom_menu.xml --> <menu xmln

Ismael Di Vita 743 Jan 1, 2023
BubbleTabBar is bottom navigation bar with customizable bubble like tabs

BubbleTabBar BubbleTabBar is bottom navigation bar with customizable bubble like tabs Usage <com.fxn.BubbleTabBar android:id="@+id/

Akshay sharma 576 Dec 30, 2022
Okuki is a simple, hierarchical navigation bus and back stack for Android, with optional Rx bindings, and Toothpick DI integration.

Okuki A simple, hierarchical navigation bus and back stack for Android, with optional Rx bindings, and Toothpick integration for automatic dependency-

Cain Wong 143 Nov 25, 2022