Tugas Fragment Codelab Navigation Pemrograman Mobile

Last update: Nov 3, 2021

Related tags

Navigation Bar CodelabNavigation1901550041

Overview

AndroidTrivia - starter code

Starter code for Android Kotlin Fundamentals codelab 3.1: Create and add a fragment.

Introduction

The AndroidTrivia app asks the user trivia questions about Android development. It makes use of the navigation component within Jetpack to move the user between screens. Each screen is implemented as a fragment.

The app navigates using buttons, the app bar, and a navigation drawer. Because students haven't yet learned about saving data or the Android lifecycle, the app tries to eliminate bugs caused by configuration changes.

Prerequisites

You need to know:

The fundamentals of Kotlin.
How to create basic Android apps in Kotlin.
How to open, build, and run apps with Android Studio.
How to work with layouts.

Getting started

Download and run the app.

License

Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments

example of trainable optimizer?

I suggest a full that implements the optimizer but a trainable step size could be a good example too...

https://discuss.pytorch.org/t/implement-a-meta-trainable-step-size/70396
good first issue help wanted

opened by renesax14 43
Higher leak memory with track_higher_grad = off.

If i get it correctly, with track_higher_grad = off using it to train should be exactly the same as not using it (automatic differentiation-wise), only that imperative update on tensor is replaced with creating new tensor, and moving the reference to the new pointer. However, I found out that it still use up more and more memory, especially on diffopt.step(loss). I think the problem is that pytorch cant distinguish between normal tensor operation and weight update, as they are all functional operation on tensor. Hence, when you do functional update additional graph is still created, forcing old weight to stay in memory. More specifically, maybe you should call detach() before https://github.com/facebookresearch/higher/blob/master/higher/optim.py#L249 ?

Below is a gist that reproduce the memory leak. https://gist.github.com/MarisaKirisame/e4a48617dbe25eee94f08ab9f7c49d99

opened by MarisaKirisame 21
Question about second-order gradients for GRU

Hi, I was trying to use the package for obtaining second-order gradients through the optimization process of a model with GRU units each followed by a linear layer. However, when I check torch.autograd.grad(loss, learner_fmodel.parameters(time=0)) I get the error RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.. With allow_unused=True, I see that the gradients with respect to the GRU parameters are None whereas the gradient with respect to the linear layer has values. I was wondering if this is indeed supported for GRU?

opened by Nithin-Holla 14
Improving documentation for copy_initial_weights
I suggest to improve the English used for the documentation in the following:

copy_initial_weights – if true, the weights of the patched module are copied to form the initial weights of the patched module, and thus are not part of the gradient tape when unrolling the patched module. If this is set to False, the actual module weights will be the initial weights of the patched module. This is useful when doing MAML, for example.

For example, "the weights of the patched module are copied to form the initial weights of the patched module" doesn't make sense to me because when the context manager is initiated a patched module does not exist yet. So it is unclear what we are copying from and to where (and why copying is something we want to do).

Also, "unrolling the patched module" does not make sense to me. We usually unroll a computaiton graph caused by a for loop. A patched module is just a neural net that has been modified by this library. Unrolling is ambiguous.

Also, there isn't a technical definition for "gradient tape".

Also, when describing what false is, saying that it's useful for MAML isn't actually useful because it doesn't even hint why it's useful for MAML.

Overall, it's impossible to use the context manager because it's unclear what that flag is suppose to be doing (and seems critical, which is weird it that is has default values).

Related:

gitissue: https://github.com/facebookresearch/higher/issues/30

SO: https://stackoverflow.com/questions/60311183/what-does-the-copy-initial-weights-documentation-mean-in-the-higher-library-for

What does copy_initial_weights do in the higher library? https://discuss.pytorch.org/t/what-does-copy-initial-weights-do-in-the-higher-library/70384

Why does MAML need copy_initial_weights=False? https://discuss.pytorch.org/t/why-does-maml-need-copy-initial-weights-false/70387

question
opened by renesax14 13

Memory leak when performing inner loop on a copy

When running the following script, memory usage seems to continually increase (not all iterations, but many of them in the beginning). Explicitly decrementing the reference counts for the functional model/differentiable optimizer seems to help, but not completely fix the issue. Script for reproducing attached. Monitoring GPU memory usage with watch -n 0.1 nvidia-smi. Running PyTorch version 1.3.0, torchvision 0.4.1, higher version 0.1.3@59537fa. Let me know if any other information would be helpful.

Code to reproduce the issue:

import torch, torchvision
import higher
import time
from copy import deepcopy


print(torch.__version__, torchvision.__version__)

inner_loop_copy = True # Setting this to False gives no memory usage increase

device = torch.device('cuda:0')
model = torchvision.models.resnet18().to(device)
opt = torch.optim.SGD(model.parameters(), lr=1e-5)

for idx in range(100):
    print(idx)
    if inner_loop_copy:
        model_ = deepcopy(model)
        opt_ = torch.optim.SGD(model_.parameters(), lr=1e-5)
    else:
        model_ = model
        opt_ = opt

    with higher.innerloop_ctx(model_, opt_) as (fm, do):
        pass

    # del fm, do # Uncommenting this helps, but doesn't completely eliminate the memory usage increase

bug help wanted

opened by eric-mitchell 9

Why not accumulate loss and then take derivative in MAML?

Why do you not do this:

def inner_loop2():
    n_inner_iter = 5
    inner_opt = torch.optim.SGD(net.parameters(), lr=1e-1)

    qry_losses = []
    qry_accs = []
    meta_opt.zero_grad()
    meta_loss = 0
    for i in range(task_num):
        with higher.innerloop_ctx(
            net, inner_opt, copy_initial_weights=False
        ) as (fnet, diffopt):
            # Optimize the likelihood of the support set by taking
            # gradient steps w.r.t. the model's parameters.
            # This adapts the model's meta-parameters to the task.
            # higher is able to automatically keep copies of
            # your network's parameters as they are being updated.
            for _ in range(n_inner_iter):
                spt_logits = fnet(x_spt[i])
                spt_loss = F.cross_entropy(spt_logits, y_spt[i])
                diffopt.step(spt_loss)

            # The final set of adapted parameters will induce some
            # final loss and accuracy on the query dataset.
            # These will be used to update the model's meta-parameters.
            qry_logits = fnet(x_qry[i])
            qry_loss = F.cross_entropy(qry_logits, y_qry[i])
            qry_losses.append(qry_loss.detach())
            qry_acc = (qry_logits.argmax(
                dim=1) == y_qry[i]).sum().item() / querysz
            qry_accs.append(qry_acc)

            # Update the model's meta-parameters to optimize the query
            # losses across all of the tasks sampled in this batch.
            # This unrolls through the gradient steps.
            #qry_loss.backward()
            meta_loss += qry_loss

    qry_losses = sum(qry_losses) / task_num
    qry_losses.backward()
    meta_opt.step()
    qry_accs = 100. * sum(qry_accs) / task_num
    i = epoch + float(batch_idx) / n_train_iter
    iter_time = time.time() - start_time

instead of what you have:

def inner_loop1():
    n_inner_iter = 5
    inner_opt = torch.optim.SGD(net.parameters(), lr=1e-1)

    qry_losses = []
    qry_accs = []
    meta_opt.zero_grad()
    for i in range(task_num):
        with higher.innerloop_ctx(
            net, inner_opt, copy_initial_weights=False
        ) as (fnet, diffopt):
            # Optimize the likelihood of the support set by taking
            # gradient steps w.r.t. the model's parameters.
            # This adapts the model's meta-parameters to the task.
            # higher is able to automatically keep copies of
            # your network's parameters as they are being updated.
            for _ in range(n_inner_iter):
                spt_logits = fnet(x_spt[i])
                spt_loss = F.cross_entropy(spt_logits, y_spt[i])
                diffopt.step(spt_loss)

            # The final set of adapted parameters will induce some
            # final loss and accuracy on the query dataset.
            # These will be used to update the model's meta-parameters.
            qry_logits = fnet(x_qry[i])
            qry_loss = F.cross_entropy(qry_logits, y_qry[i])
            qry_losses.append(qry_loss.detach())
            qry_acc = (qry_logits.argmax(
                dim=1) == y_qry[i]).sum().item() / querysz
            qry_accs.append(qry_acc)

            # Update the model's meta-parameters to optimize the query
            # losses across all of the tasks sampled in this batch.
            # This unrolls through the gradient steps.
            qry_loss.backward()

    meta_opt.step()
    qry_losses = sum(qry_losses) / task_num
    qry_accs = 100. * sum(qry_accs) / task_num
    i = epoch + float(batch_idx) / n_train_iter
    iter_time = time.time() - start_time

https://github.com/facebookresearch/higher/blob/e45c1a059e39a16fa016d37bc15397824c65547c/examples/maml-omniglot.py#L130

https://stackoverflow.com/questions/62394411/why-not-accumulate-query-loss-and-then-take-derivative-in-maml-with-pytorch-and

question

opened by renesax14 8

Computational graph not retained for BERT

I'm trying to implement first-order version of ProtoMAML (https://arxiv.org/pdf/1903.03096.pdf) for a sequence labelling task. If I use BERT as encoder, I run into this error at the line diffopt.step: RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.. Instead, if I use an LSTM as an encoder, then it runs successfully. Perhaps the graph is purged somehow since BERT is a large model?

Here is a self-contained code to replicate the issue. The issue occurs on both CPU and GPU. On line 117, you can specify the encoder as bert or lstm. It requires the transformers library from HuggingFace to run.

import higher
import torch

from torch import nn, optim
from transformers import BertModel, BertTokenizer
from torch.nn import functional as F


class BaseModel(nn.Module):

    def __init__(self, encoder, max_length, device):
        super(BaseModel, self).__init__()
        self.max_length = max_length
        self.device = device
        if encoder == 'bert':
            self.tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
            self.encoder = BertModel.from_pretrained('bert-base-uncased')
            self.encoder.pooler.dense.weight.requires_grad = False
            self.encoder.pooler.dense.bias.requires_grad = False
        elif encoder == 'lstm':
            self.encoder = nn.LSTM(batch_first=True, input_size=32, hidden_size=768)
        self.linear = nn.Linear(768, 192)
        self.to(self.device)

    def encode_text(self, text):
        if isinstance(self.encoder, BertModel):
            encode_result = self.tokenizer.batch_encode_plus(text, return_token_type_ids=False, max_length=self.max_length,
                                                             pad_to_max_length=True, return_tensors='pt')
            for key in encode_result:
                encode_result[key] = encode_result[key].to(self.device)
            return encode_result
        elif isinstance(self.encoder, nn.LSTM):
            return torch.randn((len(text), 32, 32), device=self.device)

    def forward(self, inputs):
        if isinstance(self.encoder, BertModel):
            out, _ = self.encoder(inputs['input_ids'], attention_mask=inputs['attention_mask'])
        elif isinstance(self.encoder, nn.LSTM):
            out, _ = self.encoder(inputs)
        out = out[:, 1:-1, :]
        out = self.linear(out)
        return out


class ProtoMAML:

    def __init__(self, device, encoder):
        self.output_layer_weight = None
        self.output_layer_bias = None
        self.learner = BaseModel(encoder=encoder, max_length=32, device=device)
        self.inner_optimizer = optim.SGD([p for p in self.learner.parameters() if p.requires_grad], lr=0.001)
        self.loss_fn = nn.CrossEntropyLoss()
        self.output_lr = 0.001
        self.device = device
        self.updates = 5

    def output_layer(self, input, weight, bias):
        return F.linear(input, self.output_layer_weight + weight, self.output_layer_bias + bias)

    def initialize_with_proto_weights(self, support_repr, support_label, n_classes):
        prototypes = self.build_prototypes(support_repr, support_label, n_classes)
        weight = 2 * prototypes
        bias = -torch.norm(prototypes, dim=1) ** 2
        self.output_layer_weight = torch.zeros_like(weight, requires_grad=True)
        self.output_layer_bias = torch.zeros_like(bias, requires_grad=True)
        return weight, bias

    def build_prototypes(self, data_repr, data_label, num_outputs):
        n_dim = data_repr.shape[2]
        data_repr = data_repr.view(-1, n_dim)
        data_label = data_label.view(-1)

        prototypes = torch.zeros((num_outputs, n_dim), device=self.device)

        for c in range(num_outputs):
            idx = torch.nonzero(data_label == c).view(-1)
            if idx.nelement() != 0:
                prototypes[c] = torch.mean(data_repr[idx], dim=0)

        return prototypes

    def initialize_output_layer(self, n_classes):
        self.output_layer_weight = torch.randn((n_classes, 768), requires_grad=True)
        self.output_layer_bias = torch.randn(n_classes, requires_grad=True)

    def train(self, support_text, labels, n_classes, n_iter):

        for itr in range(n_iter):
            print('Iteration ', itr)

            self.learner.zero_grad()

            self.initialize_output_layer(n_classes)
            x = self.learner.encode_text(support_text)
            y = labels.to(device)
            output_repr = self.learner(x)
            init_weights, init_bias = self.initialize_with_proto_weights(output_repr, y, n_classes)

            with higher.innerloop_ctx(self.learner, self.inner_optimizer,
                                      copy_initial_weights=False,
                                      track_higher_grads=False) as (flearner, diffopt):

                for i in range(self.updates):
                    output = flearner(x)
                    output = self.output_layer(output, init_weights, init_bias)
                    output = output.view(output.size()[0] * output.size()[1], -1)
                    loss = self.loss_fn(output, y)
                    output_weight_grad, output_bias_grad = torch.autograd.grad(loss, [self.output_layer_weight, self.output_layer_bias],
                                                                               retain_graph=True)
                    self.output_layer_weight = self.output_layer_weight - self.output_lr * output_weight_grad
                    self.output_layer_bias = self.output_layer_bias - self.output_lr * output_bias_grad
                    diffopt.step(loss)


if __name__ == '__main__':
    
    encoder = 'bert'  # or 'lstm'

    support_text = [['This is a support text']] * 64
    labels = torch.randint(0, 10, (64 * 30, ))
    n_classes = 10
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = ProtoMAML(device=device, encoder=encoder)
    model.train(support_text, labels, n_classes, n_iter=10)

opened by Nithin-Holla 8

_cudnn_rnn_backward is not implemented

Hi, I have a following error in my code, however I am using torch 1.3 . RuntimeError: derivative for _cudnn_rnn_backward is not implemented I know that it is pytorch related error. I am wondering in which version of pytorch it has been resolved!

Solved by using following code : with torch.backends.cudnn.flags(enabled=False):

opened by nooralahzadeh 8
Question about a module that does not required grad
Thanks for the wonderful library!

I have one question about how to use higher.

I want to train the network with second order derivative,

but some parts of network (such as nn.Embedding ) are frozen.

something like this

net= Net() param = filter(lambda x: x.requires_grad, model.parameters()) inner_opt = torch.optim.SGD(param, lr=1e-1) with higher.innerloop_ctx(net, inner_opt, copy_initial_weights=False) as (fnet, diffopt): logits = net(batch_x) loss = loss_fnt(logits, batch_y) diffopt.step(loss)

but i got the error message "RuntimeError: One of the differentiated Tensors does not require grad"
bug
opened by seanie12 7
Weights of monkeypatched module get reset by a forward pass after being changed manually
Any computed update to the weights of a monkeypatched module (not with a differentiable optimizer) doesn't behave as expected:

data = torch.randn(32,10) module = torch.nn.Linear(10,10) fmodel = higher.monkeypatch(module,copy_initial_weights=True) fmodel.UpdateWeights() # some update to the weights, for a simple example: fmodel.weight = fmodel.weight * 2. fmodel.forward(data) # weights are now reset as if fmodel.UpdateWeights() never happened

Edit: I've noticed that fmodel.named_parameters() and fmodel._parameters contain the correct updated parameters whereas fmodel.parameters() and fmodel.fast_params contain the old parameter values
bug
opened by Horse7354 6
More complete example please?
Hi, dont suppose.... could it be possible to have a more complete, yet still very simple, example of how to use higher please?

~like how would we write MAML using higher? (I'm aware that some of the linked papers use MAML, but the implementation tends to be buried in a lot of paper-specific things, e.g. https://github.com/facebookresearch/LearningToLearn/blob/main/ml3/ml3_train.py#L17-L34 has a bunch of abstract functions etc, without descriptions provided of what they are, so one has to go hunting through the code to figure out what they do, etc. Similarly for https://github.com/Nithin-Holla/MetaWSD/blob/master/models/seq_meta.py#L120-L143~ Ooohh, there is a maml example at https://github.com/facebookresearch/higher/blob/master/examples/maml-omniglot.py, and I've created a PR to add a link to this in the readme https://github.com/facebookresearch/higher/pull/109

if one has a loss function that takes as input parameters onto which one backpropped a few times, how would such an example look, in the general case?

like eg lets say one had neuralstyle, https://github.com/jcjohnson/neural-style , and one wanted to create a loss function based on the generated image, how would one do that using higher?

PS super awesome repo :)
opened by hughperkins 3
Is there data leakage in the maml-omniglot example?

In the maml-omniglot.py example code, net.train() is used for meta-test phases (link).

Does this not cause data leakage of meta-test data via the statistics of nn.BatchNorm2d (net contains several nn.BatchNorm2d)?

opened by SunHaozhe 0

Tracking higher-order grads when arbitrarily combining submodules of functional module

Hi, I'm implementing a bilevel optimization procedure to meta-learn an embedding and am struggling to find the best way to track higher-order gradients for all of my learned modules. The model in my full pipeline involves several torch modules which are combined in different ways depending on where in the pipeline the forward pass happens. I would like to meta-learn an embedding (optimized in the outer loop, possibly along with some of the model parameters), and optimize some model parameters in the inner loop (to produce "fast weights") where the inner loss is partially parameterized by this meta-learned embedding.

Currently, to fit into the higher framework, I'm wrapping all of the modules for my model (in addition to the learned embedding) in a super-module, with each module as a submodule. I can then pass this super-module to a higher.innerloop_ctx to track higher order grads for all params of all of these modules. Ideally, I'd like for each of the submodules to be functional as well, so that I can combine modules arbitrarily by referring to them as fmodel.submodule1(...) and have higher order gradients computed in the backward pass. However, it seems like this is not supported by higher, and that the only way to do forward passes inside an innerloop_ctx is by calling the fmodel's forward function. This is undesirable in my use case, since it would require the forward function of the super-module to be very bloated with different conditions to account for all of the cases in which it could be called.

My questions are as follows:

Is there a way to make each of my submodules functional so that they can be combined arbitrarily and still track higher order gradients?
Do I need to be using this super-module at all? Is it possible/simple to track higher order grads for several modules at once (allowing me to decouple my learned embedding from the rest of the model, for example)?

Below is a minimal runnable example to illustrate the issue I'm facing. The inner and outer losses in the example are arbitrary and just provide something to optimize. The lines of interest are marked with # NOTE: comments. The approaches using the forward method of fmodel (the super-module) produce the desired results, the others do not.

Essentially what I want is for fmodel.decoder(fmodel.encoder(frame)) to behave the same way as fmodel(frame) in the example below — is this possible with higher?

import torch
import torch.optim as optim
import torch.nn as nn

import higher

device = torch.device('cpu')
inner_lr = outer_lr = .001
channels = 3
batch_size = 7

# random dataloader for debugging
def batch_generator():
    while True:
        yield torch.rand((batch_size, channels, 64, 64), device=device)
dataloader = batch_generator()


class BasicModel(nn.Module):
    def __init__(self, nc, emb_dim):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Conv2d(nc, 16, 3, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(16, 32, 3, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(32, 64, 7)
        )
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(64, 32, 7),
            nn.ReLU(),
            nn.ConvTranspose2d(32, 16, 3, stride=2, padding=1, output_padding=1),
            nn.ReLU(),
            nn.ConvTranspose2d(16, nc, 3, stride=2, padding=1, output_padding=1),
            nn.Sigmoid()
        )
        self.emb = nn.Sequential(
            nn.Flatten(),
            nn.Linear(64 * 64 * nc, emb_dim)
        )

    def f(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x
    
    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x
    

basic_model = BasicModel(channels, emb_dim=5).to(device)
outer_opt = optim.Adam(basic_model.emb.parameters(), lr=outer_lr)
outer_mse_crit = nn.MSELoss()


for epoch in range(2):  # arbitrary number of epochs
    for batch_num in range(2):  # arbitrary number of outer steps
        frame = next(dataloader)

        inner_crit = nn.MSELoss()
        inner_opt = optim.SGD(list(basic_model.encoder.parameters()) +
                              list(basic_model.decoder.parameters()), lr=inner_lr)
        
        with higher.innerloop_ctx(
            basic_model,
            inner_opt,
            track_higher_grads=True,
            copy_initial_weights=False,
        ) as (fmodel, diffopt):
            for inner_step in range(2):  # arbitrary number of inner steps
                
                # NOTE: These two ways of computing a forward pass behave as desired,
                #       producing a gradient tape from `outer_loss` to emb params
                # gen_frame = fmodel(frame)
                # gen_frame = fmodel.forward(frame)
                
                # NOTE: There is no gradient path from `outer_loss` to embedding params
                #       when forward pass is computed in either of the following ways:
                gen_frame = fmodel.decoder(fmodel.encoder(frame))
                # seems like only the `forward` method gets special attention
                # gen_frame = fmodel.f(frame)

                embedding = fmodel.emb(gen_frame)
                # arbitrary simple stand-in loss to reproduce behavior
                inner_loss = inner_crit(torch.zeros_like(embedding), embedding)
                diffopt.step(inner_loss)

            # NOTE: These two ways of computing a forward pass behave as desired,
            #       producing a gradient tape from `outer_loss` to emb params
            # final_gen_frame = fmodel(frame)
            # final_gen_frame = fmodel.forward(frame)
            
            # NOTE: There is no gradient path from `outer_loss` to embedding params
            #       when forward pass is computed in either of the following ways:
            final_gen_frame = fmodel.decoder(fmodel.encoder(frame))
            # seems like only the `forward` method gets special attention
            # final_gen_frame = fmodel.f(frame)

        # arbitrary simple stand-in loss to reproduce behavior
        outer_loss = outer_mse_crit(torch.ones_like(final_gen_frame), final_gen_frame)

        # print gradients of `outer_loss` w.r.t. embedding parameters;
        #        `None` indicates no path
        dl_demb = torch.autograd.grad(outer_loss, basic_model.emb.parameters(),
                                      allow_unused=True, retain_graph=1)
        print(f'd(outer_loss)/d(emb): {dl_demb}')

        outer_opt.zero_grad()
        outer_loss.backward()
        outer_opt.step()

opened by dylandoblar 0

Memory not freed when moving out of scope?

Hi, thanks so much for providing this library!

I am implementing a multi-step optimization problem where I am using two models (visual_encoder resnet and a coefficient_vector) to calculate a weighted training loss. Backpropagating this loss leads to an update of my main model weights. Then, using validation loss I'd like to update my visual_encoder resnet and coefficient_vector parameters with higher.

The following function returns the gradients for those two models so that I can subsequently update them in another function.

However, when calling logits = fmodel(input) there's very high memory allocation that is never freed after returning the gradients from the function. Is there anything I am doing wrong here? Which reference is kept that I am missing? Any hint is highly appreciated and my apologies if this is not the right place to ask for this.

    with higher.innerloop_ctx(model, optimizer) as (fmodel, foptimizer):
        
        logits = fmodel(input)# heavy mempry allocation here which is never freed
        weights = calc_instance_weights(input, target, input_val, target_val, logits, coefficient_vector, visual_encoder)#this returns a weight for each training sample (input)
        weighted_training_loss = torch.mean(weights * F.cross_entropy(logits, target, reduction='none'))
        foptimizer.step(weighted_training_loss) #update fmodel main model weights

        logits = fmodel(input)
        meta_val_loss = F.cross_entropy(logits, target)

        coeff_vector_gradients = torch.autograd.grad(meta_val_loss, coefficient_vector, retain_graph=True) # get the gradients w.r.t. coefficient vector
        coeff_vector_gradients = coeff_vector_gradients[0].detach()
        visual_encoder_gradients = torch.autograd.grad(meta_val_loss,
                                                           visual_encoder.parameters())# get the gradients w.r.t. resnet parameters
        visual_encoder_gradients = (visual_encoder_gradients[0].detach(), visual_encoder_gradients[1].detach())

        return visual_encoder_gradients, coeff_vector_gradients

Thanks a lot!

opened by jessicamecht 0

when do we divide by met_batch_size?

which one is the correct one to be doing:

            # Accumulate gradients wrt meta-params for each task
            qry_loss_t.backward()  # note this is more memory efficient (as it removes intermediate data that used to be needed since backward has already been called)
            # (qry_loss_t / meta_batch_size).backward()  # note this is more memory efficient (as it removes intermediate data that used to be needed since backward has already been called)

ref: https://stackoverflow.com/questions/66606540/when-does-one-divide-by-the-meta-batch-size-for-maml-during-meta-learning

opened by brando90 0

The fmodel can't transport the running_mean and running_var of BN layer to the original model
The fmodel can't transport the running_mean and running_var to the original model, which may cause that the loss explosion when the model in meta_test dataset, please fix it or give me some advice on how to address it gracefully? Thanks a lot for that!

The sample code and related results are as follows：

for i_task in range(args.n_batch): x_spti, x_qryi = torch.from_numpy(x_spt[i_task]).float().to(device), torch.from_numpy(x_qry[i_task]).float().to(device) fmodel = higher.patch.monkeypatch(model, device=device, copy_initial_weights=False, track_higher_grads=True) support_preds = fmodel(x_spti) support_loss = loss(support_preds, x_spti) grad = torch.autograd.grad(support_loss, fmodel.fast_params, create_graph=False,retain_graph=True,allow_unused=True) fast_weights = list(map(lambda p: p[1] -adapt_lr * p[0], zip(grad, fmodel.fast_params))) for step in range(1,args.adapt_steps): fmodel.fast_params = fast_weights support_preds = fmodel(x_spti) support_loss = loss(support_preds, x_spti) grad = torch.autograd.grad(support_loss, fmodel.fast_params, create_graph=False,retain_graph=True,allow_unused=True) fast_weights = list(map(lambda p: p[1] - adapt_lr * p[0], zip(grad, fmodel.fast_params))) fmodel.fast_params = fast_weights x_hat_qry = fmodel(x_qryi) iter_error_temp = loss(x_hat_qry,x_qryi) iter_error = iter_error + iter_error_temp iter_error = iter_error/args.n_batch iter_error.backward() optimizer.step()

` 1 related result

In [1] fmodel.state_dict()['encoder.base1.conv1.bn.running_mean'] Out[1] tensor([ 0.0519, -0.1342, -0.1098, -0.0754, -0.0366, -0.0326, -0.0846, -0.0088, 0.0524, -0.0084, -0.0287, 0.0098, 0.0276, 0.0137, -0.0608, 0.0782, -0.0773, 0.1453, -0.0556, 0.0077, -0.0784, 0.1525, -0.0749, -0.1261, -0.0518, 0.0650, 0.0663, -0.0138, -0.0267, 0.0617, -0.0196, 0.0009], device='cuda:0') In [2] model.state_dict()['encoder.base1.conv1.bn.running_mean'] Out [2] tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0')

the final results comparison between with and without running_mean and running_var

The orange line means the model trianed assisted by higher Lib, and the blue line means the model trained assisted by another Lib. Since using the same random seed, the both models' parameters are all the same except running_mean and running_var. It can be seen that at the very beginning, there was an obvious loss explosion. I don't know how to address it gracefully besides pass parameters layer by layer manually.
opened by ensiwalk 0
Why DifferentiableOptimizer detaches parameters when track_higher_grads = False?
Hi! Thank you for this awesome library, it helps me a lot.

I am not sure whether I'm missing something, but I'm confused about why DifferentiableOptimizer detaches parameters when track_higher_grads = False:

https://github.com/facebookresearch/higher/blob/1e20cf9696054277b2d760f64835d5d74a3115a2/higher/optim.py#L251-L257

which cuts the gradient path back to the original model parameters, even though copy_initial_weights=False. When we set copy_initial_weights=False, we want to allow gradients flow back to the original model parameters, but line 257 cut off the gradient flow.

In my use case, I want to implement something like FOMAML and here is a simplify version of my code：

def inner_loop(self, fmodel, diffopt, train_input, train_target): # ... def outer_loop(self, task_batch): self.out_optim.zero_grad() for task_data in task_batch: support_input, support_target, query_input, query_target = task_data with higher.innerloop_ctx( self.model, self.in_optim, copy_initial_weights=False, track_higher_grads=False ) as (fmodel, diffopt): self.inner_loop(fmodel, diffopt, support_input, support_target) query_output = fmodel(query_input) query_loss = F.cross_entropy(query_output, query_target) query_loss.backward() for param in self.model.parameters(): print(param.grad) # output: None self.out_optim.step()

The gradients were not propagated back to the original parameters. My code works well after I edit the code of higher to:

new_params = params[:] for group, mapping in zip(self.param_groups, self._group_to_param_list): for p, index in zip(group['params'], mapping): new_params[index] = p

I know this problem can be solved by manully mapping the gradients, but I just wonder why detaching parameters is necessary here. And thank you for your nice work again!
opened by Renovamen 1
Added code for getting the state dict of an optimizer, as well as tests

CLA Signed

opened by murrman95 2
Is the higher library compatible with pytorch's distributed RPC?

ref: https://pytorch.org/tutorials/intermediate/rpc_tutorial.html

motivation: https://discuss.pytorch.org/t/how-to-parallelize-a-loop-over-the-samples-of-a-batch/32698/19

related: https://github.com/tristandeleu/pytorch-meta/issues/116
wontfix

opened by brando90 6
How to evaluate model without gpu memory issues?

I am using the MAML example as a template but I am additionally computing on a GPU. Normally I would use torch.no_grad to do model evaluation without overflowing GPU memory -- I obviously cannot do that with meta-learning! What is standard practice for allowing model eval during training?

opened by njwfish 0

Owner

GitHub

AndroidBriefActions - Android library for sending and observing non persistent actions such as showing a message; nice readable way to call navigation actions from ViewModel or Activity/Fragment.

implementation "com.vladmarkovic.briefactions:briefactions:$briefActionsVersion" Benefits Why use brief-actions library pattern: Prevent short-term ac

2 Dec 22, 2022

Android Navigation Fragment Share Element Example: Use Share Element Transition with recyclerView Item and ViewPager2 Item.

Android-Navigation-Fragment-Share-Element-Example 说明 Android 使用Navigation导航切换Fragment中使用共享元素过渡动画的例子：将在listFragment的RecyclerView的Item共享元素过渡到pagerFragme

3 Sep 28, 2022

Memory efficient android library for managing individual fragment backstack.

fragstack : Android library for managing individual fragment backstack. An Easy to use library for managing individual fragment back stack as Instagra

21 Feb 6, 2021

Use Fragment like Activity

Fragivity : Use Fragment like Activity English | 中文文档 Fragivity is a library used to build APP with "Single Activity + Multi-Fragments" Architecture R

278 Nov 22, 2022

Alligator is a modern Android navigation library that will help to organize your navigation code in clean and testable way.

Alligator Alligator is a modern Android navigation library that will help to organize your navigation code in clean and testable way. Features Any app

290 Dec 9, 2022

Android multi-module navigation built on top of Jetpack Navigation Compose

MultiNavCompose Android library for multi-module navigation built on top of Jetpack Navigation Compose. The goal of this library is to simplify the se

21 Dec 10, 2022

DSC Moi University session on using Navigation components to simplify creating navigation flow in our apps to use best practices recommended by the Google Android Team

Navigation Components Navigate between destination using safe args How to use the navigation graph and editor How send data between destinations Demo

6 Feb 3, 2022

[ACTIVE] Simple Stack, a backstack library / navigation framework for simpler navigation and state management (for fragments, views, or whatevers).

Simple Stack Why do I want this? To make navigation to another screen as simple as backstack.goTo(SomeScreen()), and going back as simple as backstack

1.3k Jan 2, 2023

Navigation Component: THE BEST WAY to create navigation flows for your app

LIVE #017 - Navigation Component: A MELHOR FORMA de criar fluxos de navegação para o seu app! Código fonte do projeto criado na live #017, ensinando c

4 Jun 15, 2022

New style for app design simple bottom navigation with side navigation drawer UI made in Jetpack Compose.😉😎

BottomNavWithSideDrawer New style for app design simple bottom navigtaion with side navigation drawer UI made in Jetpack Compose. ?? ?? (Navigation Co

5 Nov 24, 2022

Navigation Drawer Bottom Navigation View

LIVE #019 - Toolbar, Navigation Drawer e BottomNavigationView com Navigation Com

6 Jun 15, 2022

Bottom-App-Bar-with-Bottom-Navigation-in-Jetpack-compose-Android - Bottom App Bar with Bottom Navigation in Jetpack compose

Bottom-App-Bar-with-Bottom-Navigation-in-Jetpack-compose-Android This is simple

1 Jul 11, 2022

Animated Tab Bar is an awesome navigation extension that you can use to add cool, animated and fully customizable tab navigation in your apps

Animated Tab Bar is an awesome navigation extension that you can use to add cool, animated and fully customizable tab navigation in your apps. The extension provides handy methods and properties to change the behaviour as well as the appearance of the navigation bar.

4 Nov 30, 2022

A simple Floating Action Button that shows an anchored Navigation View

Floating Navigation View A simple Floating Action Button that shows an anchored Navigation View and was inspired by Menu Material Fixed created by Tom

1.3k Dec 29, 2022

🎉 [Android Library] A light-weight library to easily make beautiful Navigation Bar with ton of 🎨 customization option.

Bubble Navigation ?? A light-weight library to easily make beautiful Navigation Bars with a ton of ?? customization options. Demos FloatingTopBarActiv

1.7k Dec 31, 2022

A customizable and easy to use BottomBar navigation view with sleek animations, with support for ViewPager, ViewPager2, NavController, and badges.

AnimatedBottomBar A customizable and easy to use bottom bar view with sleek animations. Examples Playground app Download the playground app from Googl

1.2k Dec 30, 2022

An android navigation bar widget

Chip Navigation Bar A navigation bar widget inspired on Google Bottom Navigation mixed with Chips component. Usage  <menu xmln

743 Jan 1, 2023

BubbleTabBar is bottom navigation bar with customizable bubble like tabs

BubbleTabBar BubbleTabBar is bottom navigation bar with customizable bubble like tabs Usage <com.fxn.BubbleTabBar android:id="@+id/

576 Dec 30, 2022

Okuki is a simple, hierarchical navigation bus and back stack for Android, with optional Rx bindings, and Toothpick DI integration.

Okuki A simple, hierarchical navigation bus and back stack for Android, with optional Rx bindings, and Toothpick integration for automatic dependency-

143 Nov 25, 2022