Linux环境下O3优化与O0的差异性

Linux环境下O3优化与O0的差异性

概述:在实现RISCV_ISA的P拓展指令描述仿真时,遇到了奇怪的问题,在Linux环境下O3O0编译器优化的程序结果不同,经过调查发现,问题出在short类型指针的相关优化问题,至于这算Bug还是过度优化不好定义,但作为编译器, 无论做什么优化,至少应该保证得到的结果是正确的,下面开始描述问题。

Linux测试环境:

$ uname -a
Linux oberon 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ c++ --version 
c++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

测试用例

#include <iostream>

int main() {
  uint64_t rd = 0;
  uint64_t rs1 = 0x0101010101;
  uint64_t rs2 = 0x1010101010;
  int16_t *rs1_p = (int16_t *)&rs1;
  int16_t *rs2_p = (int16_t *)&rs2;
  int16_t *rd_p = (int16_t *)&rd;
  for (uint32_t i = 0; i < 4; ++i) {
    *(rd_p + i) = *(rs1_p + i) + *(rs2_p + i);
    printf("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n");
    printf("---------------- rd_p= 0x%x ------------\n", *(rd_p + i));
    printf("---------------- rd = 0x%lx -------------\n", rd);
    printf("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n");
  }
  return 0;
}

稍稍解释一下上面代码,因为P拓展是SIMD指令,功能是将uint64_t 类型的两个操作数,分别拆成uint8_t uint16_t 或 uint32_t 的几个数字来进行计算,在这里选择使用一个for循环处理逻辑相同的计算部分。

需要承认这种写法是有点怪异,其实是可以使用x86的SSE指令集的builtin函数实现,但可能有点麻烦,所以暂时就用For循环吧。 这也就造成了,这里需要使用指针获取int16_t 大小的内存地址之后用来加法运算。

补充,至少在我看开C++允许这么干,如果不行欢迎纠正。

O3&O0的编译测试:

Linux 直接开搞,无非编译O3与O0两份ELF。

$ c++ -O3 test3.cpp -o test_O3
$ c++ -O0 test3.cpp -o test_O0

结果:

$ ./test_O0 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x1111 ------------
---------------- rd = 0x1111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x1111 ------------
---------------- rd = 0x11111111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x11 ------------
---------------- rd = 0x1111111111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x0 ------------
---------------- rd = 0x1111111111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
$ ./test_O3 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x3e0a ------------
---------------- rd = 0x0 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x5db6 ------------
---------------- rd = 0x0 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x5dcf ------------
---------------- rd = 0x0 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0xffffd128 ------------
---------------- rd = 0x0 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

分析:

可以很明显的看到,O3编译的结果与O0不同,原因猜测是因为,O3优化减少中间变量,尽量使用寄存器中保存的值,而不是内存中的值。

short类型的指针操作会被编译器优化掉,那么其他类型的是什么结果???将int16_t 换成int8_t ,具体代码就不贴了,无非就是指针类型改变一下。结果如下:

$ ./test_O3 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x11 ------------
---------------- rd = 0x11 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x11 ------------
---------------- rd = 0x1111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x11 ------------
---------------- rd = 0x111111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x11 ------------
---------------- rd = 0x11111111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x11 ------------
---------------- rd = 0x1111111111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x0 ------------
---------------- rd = 0x1111111111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x0 ------------
---------------- rd = 0x1111111111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x0 ------------
---------------- rd = 0x1111111111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
$ ./test_O0 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x11 ------------
---------------- rd = 0x11 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x11 ------------
---------------- rd = 0x1111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x11 ------------
---------------- rd = 0x111111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x11 ------------
---------------- rd = 0x11111111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x11 ------------
---------------- rd = 0x1111111111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x0 ------------
---------------- rd = 0x1111111111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x0 ------------
---------------- rd = 0x1111111111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x0 ------------
---------------- rd = 0x1111111111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

解决方案之volatile:

此时,O3与O0的结果已经一致,没有出现过渡优化问题,由于不是专业人士,无法评判是编译器的Bug,还是由于自身专业水平有限,不会添加编译参数造成这种现象。

那么,出问题是要解决呀,毕竟,毕竟,它和我都不想跑!!!

叮,C++神奇的关键字volatile。详细的可以看相关介绍。

简单的说,当他修饰某个对象时,就是告诉编译器,不要尝试对它进行奇怪的优化了,因为它是易变的,每次用它的时候都去内存(这里所说的内存,包括内存与缓存,由于缓存的一致性可以保障内存与缓存一致,所以不需要真的去内存取,除非Cache Miss)里面取吧。

修改后的代码:

#include <iostream>

int main() {
  volatile uint64_t rd = 0;
  volatile uint64_t rs1 = 0x0101010101;
  volatile uint64_t rs2 = 0x1010101010;
  int16_t *rs1_p = (int16_t *)&rs1;
  int16_t *rs2_p = (int16_t *)&rs2;
  int16_t *rd_p = (int16_t *)&rd;
  for (uint32_t i = 0; i < 4; ++i) {
    *(rd_p + i) = *(rs1_p + i) + *(rs2_p + i);
    printf("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n");
    printf("---------------- rd_p= 0x%x ------------\n", *(rd_p + i));
    printf("---------------- rd = 0x%lx -------------\n", rd);
    printf("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n");
  }
  return 0;
}
$ ./test_O3 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x1111 ------------
---------------- rd = 0x1111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x1111 ------------
---------------- rd = 0x11111111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x11 ------------
---------------- rd = 0x1111111111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
---------------- rd_p= 0x0 ------------
---------------- rd = 0x1111111111 -------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

暂无评论

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇