Made mmap-related code compatible with Cygwin #10

mvanleeu · 2022-03-26T07:45:46Z

The whole logic of first blocking the memory + guard page, only to overwrite it afterwards in a fixed location does not work in cygwin (startp_ gets a value of 0xffffffffffff). So created a Cygwin path and simplified the mmap to a maximum.
Didn't bother to #ifdef all references to guardp_ though, which may have been cleaner.

Testing has been limited to the benchmark files, fitting in RAM.

The whole logic of first blocking the memory + guard page, only to overwrite it then in the fixed location does not work in cygwin (startp_ gets a value of 0xffffffffffff). So created a Cygwin path and simplified the mmap to a maximum. Didn't bother to #ifdef all references to guardp_ though, which may have been cleaner. Testing has been limited to the benchmark files, fitting in RAM.

dw · 2022-03-26T11:12:17Z

I haven't touched this code in a long time, but IIRC it is always necessary to be able to run 15 bytes over the end of the buffer, to allow pcmpistri to begin on the last byte (which might be necessary depending on the length of the file, and/or the prior data bytes appearing earlier in the buffer)

Won't merge this one without a bit more clarity, it looks like it might be breaking that requirement

Verify if the use of SIMD instructions do not cause to read after the end of the mmap buffer. Test files = length-{_n_}.csv with 'n' == 0 to 15 and representing the result of file size(length-_n_.csv) modulo 16, as the SIMD instruction tested processes 16 bytes at a time. Separate unittest file as the existing one somehow crashed.

mvanleeu · 2022-03-27T17:11:23Z

Indeed, the description of csvmonkey mentions the approach of parsing 16 bytes at a time via SIMD.
To see if I was just lucky running the test in an ideal scenario, I have doctored some small test data file that have slightly different lengths (i.e. "length of file" modulo 16 == "length-{rest_value}.csv", with the last field of the file filled with the character 'x' equal to 'rest_value + 1' . I created test case to read and parse the last field of the file and verify if the number of 'x' characters matches.
Somehow, none of the length made the test program crashed or returned wrong values, which I would have expected if the mmap would cause the SMID instruction to process data that was after the end of the mmap.
Please review if this test case is actually testing the right thing.

mvanleeu · 2022-03-27T17:14:56Z

$ python3 tests/csvmonkey_test_mmap.py
.
----------------------------------------------------------------------
Ran 1 test in 0.004s

OK

mvanleeu added 2 commits March 26, 2022 08:35

Indicated cygwin compatibility

355dea0

mvanleeu added 3 commits March 27, 2022 19:06

Add files via upload

1ee9ae2

Merge branch 'dw:master' into master

79af61e

Fixes access violation under Windows 64 bits.

7f81720

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Made mmap-related code compatible with Cygwin #10

Made mmap-related code compatible with Cygwin #10

Uh oh!

mvanleeu commented Mar 26, 2022

Uh oh!

dw commented Mar 26, 2022

Uh oh!

mvanleeu commented Mar 27, 2022

Uh oh!

mvanleeu commented Mar 27, 2022

Uh oh!

Uh oh!

Made mmap-related code compatible with Cygwin #10

Are you sure you want to change the base?

Made mmap-related code compatible with Cygwin #10

Uh oh!

Conversation

mvanleeu commented Mar 26, 2022

Uh oh!

dw commented Mar 26, 2022

Uh oh!

mvanleeu commented Mar 27, 2022

Uh oh!

mvanleeu commented Mar 27, 2022

Uh oh!

Uh oh!