返回列表 发布新帖

策略选择—为什么Aloha选择了ACT?

294 2
发表于 2024-10-10 11:39:00 | 查看全部 阅读模式
本帖最后由 kiara 于 2024-10-10 11:44 编辑

policy对比: 任务1 sim_transfer_cube_scriptedDiffusion Policy

"ENABLE_EMA = True" 会出现报错"nets = self.ema.averaged_model" ,所以改为"ENABLE_EMA = False",可以训练。

在2000steps后的ACT得到20%成功率,我们现在用Diffusion Policy,同样的2000steps:
  1. Training finished:
  2. Seed 0, val loss 0.069448 at step 2000
  3. Best ckpt, val loss 0.069448 @ step2000
复制代码
  1. <code>--eval</code>:
复制代码
  1. policy_last.ckpt: success_rate=0.0 avg_return=0.4
复制代码

Diffusion Policy 5000 steps:
  1. Training finished:
  2. Seed 0, val loss 0.031175 at step 5000
  3. Best ckpt, val loss 0.031175 @ step5000
复制代码
  1. <code>--eval</code>:
复制代码
  1. policy_last.ckpt: success_rate=0.0 avg_return=0.4
复制代码

Diffusion Policy 40000 steps:
  1. Training finished:
  2. Seed 0, val loss 0.010312 at step 31500
  3. Best ckpt, val loss 0.010312 @ step31500
复制代码
  1. <code>--eval</code>:
复制代码
  1. <pre class="ace-line ace-line old-record-id-X4AidYWIOohPndx4MmEczQkvnke"><code class="language-Bash" data-lark-language="Bash" data-wrap="true">policy_last.ckpt: success_rate=0.0 avg_return=0.0</code></pre><div class="ace-line ace-line old-record-id-TBIidRrEiojyOYx452DcexSGnhb"></div>
复制代码

Diffusion Policy 160000 steps:
  1. Training finished:
  2. Seed 0, val loss 0.009454 at step 152500
  3. Best ckpt, val loss 0.009454 @ step152500
复制代码
  1. <code>--eval</code>:
复制代码
  1. policy_last.ckpt: success_rate=0.0 avg_return=0.2
复制代码

结果并不理想,不能排除关闭EMA的影响。

policy对比: 任务2 sim_insertion_scripted

同任务1一样,先收集模拟实验:



Success: 48 / 50


ACT

2000 steps:
  1. <pre class="ace-line ace-line old-record-id-RSWqdcWQNoFvv8xLTOgcc3fHnYb"><code class="language-Plain Text" data-lark-language="Plain Text" data-wrap="false">Training finished:
  2. Seed 0, val loss 0.245840 at step 1500
  3. Best ckpt, val loss 0.245840 @ step1500</code></pre><pre class="ace-line ace-line old-record-id-A3TIdeT1qoTajFxSwLuc6kinnre"></pre>
复制代码
  1. policy_last.ckpt: success_rate=0.3 avg_return=228.3
复制代码

5000 steps:
  1. Training finished:
  2. Seed 0, val loss 0.142431 at step 5000
  3. Best ckpt, val loss 0.142431 @ step5000
复制代码
  1. <pre class="ace-line ace-line old-record-id-MZMDdcxoMod1fsx4odzcb5eQnMe"><code class="language-Plain Text" data-lark-language="Plain Text" data-wrap="false">policy_last.ckpt: success_rate=0.5 avg_return=366.1</code></pre><h3 class="heading-3 ace-line old-record-id-WMMQdw2IXoKETHxBhCoc8IJBnff"></h3>
复制代码

Diffusion Policy
2000 steps:
  1. Training finished:
  2. Seed 0, val loss 0.070294 at step 2000
  3. Best ckpt, val loss 0.070294 @ step2000
复制代码
  1. policy_last.ckpt: success_rate=0.0 avg_return=0.0
复制代码

5000 steps:
  1. policy_last.ckpt: success_rate=0.0 avg_return=0.0
复制代码

和任务1一样,diffusion policy的表现不如act。




本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有账号?立即注册

×

评论2

胡言乱语阿瑞基Lv.7 发表于 2024-10-16 14:51:11 | 查看全部
有理有据有节,学到了
王佳乐Lv.1 发表于 2024-10-17 16:37:26 | 查看全部
好厉害!谢谢博主!

回复

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

Copyright © 2025 OPENLOONG. All Rights Reserved. Powered by Discuz!
  • 关注B站
  • 关注抖音
  • 关注微信公众号
Copyright © 2025 开发者论坛 - OpenLoong 版权所有 All Rights Reserved.
关灯 在本版发帖 返回顶部
快速回复 返回顶部 返回列表