本帖最后由 kiara 于 2024-10-10 11:44 编辑
policy对比: 任务1 sim_transfer_cube_scriptedDiffusion Policy
"ENABLE_EMA = True" 会出现报错"nets = self.ema.averaged_model" ,所以改为"ENABLE_EMA = False",可以训练。
在2000steps后的ACT得到20%成功率,我们现在用Diffusion Policy,同样的2000steps:
- Training finished:
- Seed 0, val loss 0.069448 at step 2000
- Best ckpt, val loss 0.069448 @ step2000
复制代码- policy_last.ckpt: success_rate=0.0 avg_return=0.4
复制代码
Diffusion Policy 5000 steps:
- Training finished:
- Seed 0, val loss 0.031175 at step 5000
- Best ckpt, val loss 0.031175 @ step5000
复制代码- policy_last.ckpt: success_rate=0.0 avg_return=0.4
复制代码
Diffusion Policy 40000 steps:
- Training finished:
- Seed 0, val loss 0.010312 at step 31500
- Best ckpt, val loss 0.010312 @ step31500
复制代码- <pre class="ace-line ace-line old-record-id-X4AidYWIOohPndx4MmEczQkvnke"><code class="language-Bash" data-lark-language="Bash" data-wrap="true">policy_last.ckpt: success_rate=0.0 avg_return=0.0</code></pre><div class="ace-line ace-line old-record-id-TBIidRrEiojyOYx452DcexSGnhb"></div>
复制代码
Diffusion Policy 160000 steps:
- Training finished:
- Seed 0, val loss 0.009454 at step 152500
- Best ckpt, val loss 0.009454 @ step152500
复制代码- policy_last.ckpt: success_rate=0.0 avg_return=0.2
复制代码
结果并不理想,不能排除关闭EMA的影响。
policy对比: 任务2 sim_insertion_scripted
同任务1一样,先收集模拟实验:
Success: 48 / 50
ACT
2000 steps:
- <pre class="ace-line ace-line old-record-id-RSWqdcWQNoFvv8xLTOgcc3fHnYb"><code class="language-Plain Text" data-lark-language="Plain Text" data-wrap="false">Training finished:
- Seed 0, val loss 0.245840 at step 1500
- Best ckpt, val loss 0.245840 @ step1500</code></pre><pre class="ace-line ace-line old-record-id-A3TIdeT1qoTajFxSwLuc6kinnre"></pre>
复制代码- policy_last.ckpt: success_rate=0.3 avg_return=228.3
复制代码
5000 steps:
- Training finished:
- Seed 0, val loss 0.142431 at step 5000
- Best ckpt, val loss 0.142431 @ step5000
复制代码- <pre class="ace-line ace-line old-record-id-MZMDdcxoMod1fsx4odzcb5eQnMe"><code class="language-Plain Text" data-lark-language="Plain Text" data-wrap="false">policy_last.ckpt: success_rate=0.5 avg_return=366.1</code></pre><h3 class="heading-3 ace-line old-record-id-WMMQdw2IXoKETHxBhCoc8IJBnff"></h3>
复制代码
Diffusion Policy
2000 steps:
- Training finished:
- Seed 0, val loss 0.070294 at step 2000
- Best ckpt, val loss 0.070294 @ step2000
复制代码- policy_last.ckpt: success_rate=0.0 avg_return=0.0
复制代码
5000 steps:
- policy_last.ckpt: success_rate=0.0 avg_return=0.0
复制代码
和任务1一样,diffusion policy的表现不如act。
|