 |
forums.ps2dev.org Homebrew PS2, PSP & PS3 Development Discussions
|
| View previous topic :: View next topic |
| Author |
Message |
Criptych
Joined: 12 Sep 2009 Posts: 79
|
Posted: Sun Jun 13, 2010 7:18 am Post subject: VFPU instruction |
|
|
While Googling to find out how to add the 'vuc2i' instruction to gas (since it's apparently not in there yet) I found a reference to a 'vcmmul' instruction. Anyone have more info on this instruction? From the usage it looks like just a backward 'vmmul', but there must be something different about it to warrant having a separate instruction, right? _________________ PSP-2000 // CFW: 5.50 GEN-D2 ...and not upgrading until OFW supports homebrew!
(But I did downgrade to 1.50 with TimeMachine...)
"I want you to tell me how the machine makes you feel."
Last edited by Criptych on Sun Jun 20, 2010 9:35 am; edited 1 time in total |
|
| Back to top |
|
 |
hlide
Joined: 10 Sep 2006 Posts: 750
|
Posted: Sun Jun 13, 2010 7:50 pm Post subject: |
|
|
| this is just vmmul. vcmmul and vrmmul where c is for column and r for row. the order of register are just swapped/transposed by the assembler to output it as a vmmul. |
|
| Back to top |
|
 |
Criptych
Joined: 12 Sep 2009 Posts: 79
|
Posted: Mon Jun 14, 2010 1:03 am Post subject: |
|
|
| hlide wrote: | | this is just vmmul. vcmmul and vrmmul where c is for column and r for row. the order of register are just swapped/transposed by the assembler to output it as a vmmul. |
So it's an assembler shortcut, like 'u[ls]v'? Does that mean 'vrmmul' is identical to a regular 'vmmul'? _________________ PSP-2000 // CFW: 5.50 GEN-D2 ...and not upgrading until OFW supports homebrew!
(But I did downgrade to 1.50 with TimeMachine...)
"I want you to tell me how the machine makes you feel." |
|
| Back to top |
|
 |
hlide
Joined: 10 Sep 2006 Posts: 750
|
Posted: Mon Jun 14, 2010 6:01 am Post subject: |
|
|
vmmul.q M000, M100, M200
<==>
vrmmul.q M000, M100, M200
<==>
vcmmul.q E000, E200, E100 |
|
| Back to top |
|
 |
Criptych
Joined: 12 Sep 2009 Posts: 79
|
Posted: Mon Jun 14, 2010 11:28 am Post subject: |
|
|
Got it, thanks. Learn something new every day... :) _________________ PSP-2000 // CFW: 5.50 GEN-D2 ...and not upgrading until OFW supports homebrew!
(But I did downgrade to 1.50 with TimeMachine...)
"I want you to tell me how the machine makes you feel." |
|
| Back to top |
|
 |
Criptych
Joined: 12 Sep 2009 Posts: 79
|
Posted: Sun Jun 20, 2010 9:39 am Post subject: |
|
|
While on the topic of the VFPU, can you explain the condition codes for the "f1" format of vcmp (in the list here)? I figure that EZ is "equal zero" and NZ is "not zero," but the others have me confused.
EDIT: Well, I've found that EN/NN test for NaNs and EI/NI for infinities, but ES/NS don't check the Sign as I first thought - in fact everything (non-NaN/infinity) I've tried it with so far gives me CC=0. Does that mean it tests for "Special" values? _________________ PSP-2000 // CFW: 5.50 GEN-D2 ...and not upgrading until OFW supports homebrew!
(But I did downgrade to 1.50 with TimeMachine...)
"I want you to tell me how the machine makes you feel." |
|
| Back to top |
|
 |
hlide
Joined: 10 Sep 2006 Posts: 750
|
Posted: Tue Jun 22, 2010 12:35 am Post subject: |
|
|
Ex = Equals to
Nx = Not equals to
xI = Infinity
xN = NaN
xS = Special, that is Infinity or NaN
So : ES == EI|EN and NS == NI|NN |
|
| Back to top |
|
 |
Criptych
Joined: 12 Sep 2009 Posts: 79
|
Posted: Tue Jun 22, 2010 12:58 am Post subject: |
|
|
Thanks for confirming that.
The C equivalent of what I'm trying to write is this:
| Code: | | if(x < 0) r = (y < 0) ? (-PI - r) : (PI - r); |
What I have in assembly (r is in s000, x & y in c010):
| Code: | vzero.p c012
vcst.s s001, VFPU_PI
vcmp.p LT, c010, c012
vcmovt.s s001, s001[-x], 0
vsub.s s001, s001, s000
vcmovt.s s000, s001, 1 |
Can you suggest anything to improve it? _________________ PSP-2000 // CFW: 5.50 GEN-D2 ...and not upgrading until OFW supports homebrew!
(But I did downgrade to 1.50 with TimeMachine...)
"I want you to tell me how the machine makes you feel." |
|
| Back to top |
|
 |
hlide
Joined: 10 Sep 2006 Posts: 750
|
Posted: Tue Jun 22, 2010 2:10 am Post subject: |
|
|
I'm going to depart on the road so you'll probably need to wait for tomorrow. Can i suggest you to post the "else" part of you "if (x < 0)" ?
branching costs a lot and if i know what happens when x >= 0, i may find a better way globally. |
|
| Back to top |
|
 |
Criptych
Joined: 12 Sep 2009 Posts: 79
|
Posted: Tue Jun 22, 2010 6:46 am Post subject: |
|
|
| hlide wrote: | I'm going to depart on the road so you'll probably need to wait for tomorrow. Can i suggest you to post the "else" part of you "if (x < 0)" ?
branching costs a lot and if i know what happens when x >= 0, i may find a better way globally. |
There is no "else," actually. The whole function (an implementation of atan2) would look something like this in C:
| Code: | float fast_atan2(float y, float x)
{
float r = asinf(y/hypotf(x, y));
if(x < 0) r = (y < 0) ? (-PI - r) : (PI - r);
return r;
} |
My original version had the first line in assembly and the rest in C, because I was just starting with using the VFPU, but I want to rewrite it so everything is done on one (co)processor instead of going back and forth between VFPU and FPU.
The whole assembly version is this:
| Code: |
fast_atan2:
mtv $a1, s010
mtv $a0, s011
vcst.s s003, VFPU_PI_2
vdot.p s000, c010, c010
vrsq.s s000, s000
vmul.s s000, s011, s000
vasin.s s000, s000
vmul.s s000, s000, s003
vzero.p c012
vcst.s s001, VFPU_PI
vcmp.p LT, c010, c012
vcmovt.s s001, s001[-x], 0
vsub.s s001, s001, s000
vcmovt.s s000, s001, 1
j $ra
mfv $v0, s000 |
I know: "make it work, then make it work fast"; but I'm doing this for practice as much as to write something useful. _________________ PSP-2000 // CFW: 5.50 GEN-D2 ...and not upgrading until OFW supports homebrew!
(But I did downgrade to 1.50 with TimeMachine...)
"I want you to tell me how the machine makes you feel." |
|
| Back to top |
|
 |
hlide
Joined: 10 Sep 2006 Posts: 750
|
Posted: Tue Jun 22, 2010 11:20 am Post subject: |
|
|
Yours:
| Code: |
fast_atan2:
mtv $a0, s010 // 1:1(3)
mtv $a1, s011 // 2:1(3)
vcst.s s003, VFPU_PI_2 // 3:1(3)
vdot.p s000, c010, c010 // 5:1(7) *STALLING because mtv needs 3 cycles to be completed*
vrsq.s s000, s000 // 12:1(7) *STALLING because vdot.p needs 7 cycles to be completed*
vmul.s s000, s011, s000 // 19:1(5) *STALLING because vrsq.s needs 7 cycles to be completed*
vasin.s s000, s000 // 24:1(7) *STALLING because vmul.s needs 5 cycles to be completed*
vmul.s s000, s000, s003 // 31:1(7) *STALLING because vmul.s needs 5 cycles to be completed*
vzero.p c012 // 32:1(3)
vcst.s s001, VFPU_PI // 33:1(3)
vcmp.p LT, c010, c012 // 35:1(3) *STALLING because vzero.p needs 3 cycles to be completed*
vcmovt.s s001, s001[-x], 0 // 42:1+1(5) *STALLING because you need 7 instructions before using vcmp.p result*
vsub.s s001, s001, s000 // 48:1(3) *STALLING because vcmovt.s needs 5 cycles to be completed*
vcmovt.s s000, s001, 1 // 51:1(5) *STALLING because vsub.s needs 3 cycles to be completed*
j $ra // 52:1
mfv $v0, s000 // 56:7 *STALLING because vcmovt.s needs 5 cycles to be completed*
|
mfv is 7 cycles no matter what the the following instruction is, so your function is taking around 63 cycles
(untested):
reordering some instructions in yours :
| Code: |
fast_atan2:
mtv $a0, s010 // 1:1(3)
mtv $a1, s011 // 2:1(3)
vdot.p s000, c010, c010 // 5:1(7) *STALLING because mtv needs 3 cycles to be completed*
vcmp.p LT, c010, c010[0,0] // 6:1+1(3)
vcst.s s003, VFPU_PI_2 // 7:1(3)
vcst.s s001, VFPU_PI // 8:1(3)
vrsq.s s000, s000 // 12:1(7) *STALLING because vdot.p needs 7 cycles to be completed*
vmul.s s000, s011, s000 // 19:1(5) *STALLING because vrsq.s needs 7 cycles to be completed*
vcmovt.s s001, s001[-x], 0 // 20:1+1(5)
vasin.s s000, s000 // 24:1(7) *STALLING because vmul.s needs 5 cycles to be completed*
vmul.s s000, s000, s003 // 31:1(7) *STALLING because vmul.s needs 5 cycles to be completed*
vsub.s s001, s001, s000 // 38:1(3) *STALLING because vcmovt.s needs 5 cycles to be completed*
vcmovt.s s000, s001, 1 // 41:1(5) *STALLING because vsub.s needs 3 cycles to be completed*
j $ra // 42:1
mfv $v0, s000 // 46:7 *STALLING because vcmovt.s needs 5 cycles to be completed*
|
should be around 53 cycles
or
| Code: |
fast_atan2:
mtv $a0, s010 // 1:1(3)
mtv $a1, s011 // 2:1(3)
vcst.s s001, VFPU_PI // 3:1(3)
vcst.s s002, VFPU_PI_2 // 4:1(3)
vslt.p c020, c010, c020[0,0] // 5:1+1(3) // (x < 0 ? 1.0 : 0.0, y < 0 ? 1.0 : 0.0)
vdot.p s000, c010, c010 // 7:1(7)
vsge.p c022, c010, c022[0,0] // 9:1+1(3) // (x >= 0 ? 1.0 : 0.0, y >= 0 ? 1.0 : 0.0)
vmul.s s001, s020, s001 // 10:1(5) // PI * (x < 0 ? 1.0 : 0.0)
vsub.p c012, c022, c020 // 11:1(3) // (x < 0 ? -1.0 : 1.0, y < 0 ? -1.0 : 1.0)
vrsq.s s000, s000 // 14:1(7) *STALLING*
vmul.p c002, c012, c002 // 15:1(5) // (x < 0 ? -PI/2 : +PI/2, PI * (x < 0 ? 1.0 : 0.0) * (y < 0 ? -1.0 : 1.0))
vasin.s s000, s000 // 21:1(7) *STALLING*
//vmul.s s000, s000, s002 //
//vadd.s s000, s000, s003
vdot.p s000, c002, c000[x, 1] // 28:1+1(7) *STALLING*
j $ra // 30:1
mfv $v0, s000 // 36:7 *STALLING*
|
should be around 43 cycles |
|
| Back to top |
|
 |
Criptych
Joined: 12 Sep 2009 Posts: 79
|
Posted: Wed Jun 23, 2010 7:32 am Post subject: |
|
|
*facepalm* Okay, I'm still getting used to accounting for the pipeline, but I didn't even consider vslt or some of your other alternatives; I can tell I've got a lot to learn about this. Thanks for your help. :-) _________________ PSP-2000 // CFW: 5.50 GEN-D2 ...and not upgrading until OFW supports homebrew!
(But I did downgrade to 1.50 with TimeMachine...)
"I want you to tell me how the machine makes you feel." |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|