We even have a function for the cartesian product / direct sum: Your matrix M is vcat(Ms.).īut you probably need to build this matrix by hand ( vcat for a splatted giant list of matrices could be ugly).įor reference as it might be useful for your linear algebra course, this problem comes from FEA. Specializing the code to this number might be a large advantage, as pointed out (fully unroll the N2 loop during matrix product), as long as the cache is well-used (meaning: recursive subdivision / hilbert curve for the order of N2-dimensional dot products). That being said, OP only contracts N2=8 dimensions. Perfect exercise for a linear algebra course (universal algebra definition of product), I’m gonna steal this Now OP has multiple vectors make it a matrix and it is a general matrix-matrix product. Take the direct sum of the codomains of all the linear operators you want to apply then you just need to apply a single large linear operator to a single vector. OP wants to apply a bunch of linear operators on a single vector. Julia> nVector = 1000 # Usually varies between 1e4 and 2e5Ġ.080671 seconds (6 allocations: 53.406 MiB, 77.77% gc time)Ġ.132145 seconds (3.01 k allocations: 107.407 MiB, 46.73% gc time)Įventually he says he wants all of the matrices to be different sizes, in which case this won’t work. Julia> nMatrix = 1000 # Usually varies between 1e4 - 1e7, sometimes as high as 1e8 Your job can be expressed as a single matrix-matrix multiplication if you use the proper memory layout. You probably want to use the mighty GEMM. As mentioned in my comments on the first Julia code, the total number of matrices is usually very large (1e4 - 1e8) so I think I’d rather use shared memory so that I don’t have nProc * sizeof(M) from multiple processes with their own copies of the data. So not only is the code not “6-threads = ~6x faster” but it’s eating up alot of RAM too. #34.264 ms (2976 allocations: 106.60 MiB)Īnd tracking the julia.exe process in task manager shows >14GB RAM utilization (128GB available), plus takes about a minute to run. R = zeros(N1,nVector for i = 1:nMatrix # Iterate through the matrices But when I naively try to use multi-threading: # export JULIA_NUM_THREADS=6 I have a 12 physical-core workstation (and access to a 72-core node) and it sure would seem like I could parallelize across the matrices. Matlab 2018b Comparison LASTN = maxNumCompThreads(1) ĭisp()Īre there any thoughts on this? As a long-time Matlab user, that’s the Julia that seems most natural to me… but are there other things I’ve missed… preferably without completely obfuscating my code?Īssuming that I have mostly maxed out performance of Julia, I’ve looked into parallel computations. Which isn’t too shabby While not quite as fast as single-threaded Matlab 2018b, we’re on the same order of magnitude - so I chalk this up to different math libraries: R = M * V # Matrix-Matrix multiplication is equivalent to naive approach NVector = 1000 # Usually varies between 1e4 and 2e5įor i = 1:nMatrix # Iterate through the matricesįor j = 1:nVector # Iterate through the vectors NMatrix = 1000 # Usually varies between 1e4 - 1e7, sometimes as high as 1e8 Pseudocode: M(N1,N2,nMatrix) x D(N2,nVector) => R(N1,nVector,nMatrix) But for now I’m just trying to understand performant Julia on this “array of matrices” case. In Matlab I might think of this as a cell-array of matrices and a cell array of vectors. Each matrix is the same size (N1xN2) in my current form, but eventually I’d like to remove this requirement, so that each matrix-vector combination can have arbitrary size. Each matrix multiplies the same, entire list of vectors. I have a critical portion of code that essentially involves the multiplication of many pre-computed matrices by many pre-computed vectors.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |