Sunday, April 11, 2021

Merging Terabytes of Files - Part 1

Problem: Given terabytes of data, in millions of files, on multiple drives merging this dataset is a daunting task. 


Solution

There are several things that need to happen.

  1. Duplicate files need to be identified, even if they have different names and dates.
  2. Duplicate directories need to be found,  even if the directories, subdirectories, and files have different names.
  3. Similar directories should be merged and naming differences reconciled.

This problem can be broken down in to three steps. The first is to build a catalog of the directories and files. The second is to process the catalog to find duplicates and similarities. The third and final step is to build the new, merged file system.


Step #1 - Cataloging the file system


To catalog the files the following is done.
  1. Each directory is crawled to collect the following
    • filename
    • file extension
    • full file path
    • md5 hash (to uniquely identify the file)
    • creation date
    • modification date
    • file size
  2. A text output file is generated with all of this data



The Cataloging Script


For python 3.8+, the following script will build a catalog with path, names, size, dates, and has for each file. Once created, it easy to load this file in Excel and review the catalog.


import glob
import os
import pathlib
import datetime
import hashlib

# starting points for the catalog
root_dir = 'D:/' 
# file to store the catalog
outfilename = "Directory-Report.txt"

with open(outfilename, 'w') as outfile:

    for fullfilename in glob.iglob(root_dir + '**/**', recursive=True):
         print(fullfilename)
         
         # try:
         if True:
            path, filenameext = os.path.split(fullfilename)
            filename, file_extension = os.path.splitext(filenameext)
             
            # get md5 hash
            try:
                with open(fullfilename, "rb") as f:
                    file_hash = hashlib.md5()
                    while chunk := f.read(8192):
                         file_hash.update(chunk)
                         file_hash_str = file_hash.hexdigest()
            except:
                file_hash_str = ""


            # get create date
            fname = pathlib.Path(fullfilename)
            createdate = datetime.datetime.fromtimestamp(fname.stat().st_ctime)
            # get modification date
            moddate = datetime.datetime.fromtimestamp(fname.stat().st_mtime)
            # get file size
            size = os.path.getsize(fullfilename)
                         
             outfile.write('"%s","%s","%s","%s","%s",%s,%s,%s,%s\n' % 
                   (fullfilename,path,filenameext,filename,file_extension,createdate,moddate,size,file_hash_str))
                   
                   

Tuesday, February 16, 2021

Simulating and Visualizing 3D Sine Waves and Interference Using Blender - An Antenna Array Factor Example

Complex equations make it hard to understand how a behavior of a system changes with different design. Visualizing the result of design changes helps develop insight and can lead to better designs. Matlab, Python, and other tools are typically used to do this kind of exploration. However, other tools can also be very useful. 

Blender is used to visualize scientific results from simulations and scripts like shown here. Surprisingly, it can also be used to simulate and gain insight into real physical systems. The animation below shows a visualization of the interactions that contribute to the 'Array Factor' for an array of antennas as the spacing between 5 elements varies. Because of how optimized Blender is for rendering and geometry, this animation can be interacted and rendered in real time. The way this is generated is to used the 'shaders' in Blender to calculate the array factor. 


Discussion

The array factor for an antenna is defined as 


This calculation describes how the radiation from an antenna is scaled when it is part of a phased array. A key aspect is that by putting antennas in arrays, the radiation becomes stronger in some directions and weaker in others. This equation describes how the waves from each antenna are combined. 
This equation uses complex numbers, can be seen differently using Euler's formula
By expanding the AF equation, it becomes evident that this can be simulated using a repeating wave. This is where Blender can be used.

In Blender, there is high performance, highly optimized code used to texture the geometry. This is referred to as shaders. There are standard shaders and it is even possible to write customer shaders in OSL. Shaders can be defined as functions of x,y,z in space along with inputs from geometry and other attributes in the Blender mode.

By using the shader logic its possible to solve for the interference patterns in array antennas and visualize the results.

References


Tools


Details

The first step is defining sinusoidal waves that are a function of the origin of each antenna. This is done using the wave texture configured for spherical waves. The 'Vector' input locates the wave in space.




This wave texture is generated for each antenna in the array. In this example, there are 5 antennas, so there are 5 wave textures used.


To tie this into the antennas, the coordinates from each antenna are introduced into the shader using a texture coordinate from the antenna object. This way, if the antenna moves, then the origin of each wave moves.



This logic describes the radiation anywhere in the model. It is just a matter of assigning it to a volume or surface to visualize. This requires converting the output of the sine waves into a color, then defining how it is used to color a surface.



Putting all of this together, the antennas can be positioned anywhere, and the resulting interference pattern of the waves will be quickly and correctly calculated, then visualized. The array factor for two different antenna arrays are shown below. 




Additional detail and rendering options can be found in this stack exchange answer - https://blender.stackexchange.com/questions/213194/i-need-to-simulate-the-interference-of-two-sinewaves/213260#213260




Monday, February 15, 2021

Animated GIFs for Scientific Visualization - An Example Antenna Array Animation using Python, Matplotlib & GIMP

This animation shows how the radiation patterns from an array of antennas as more antennas are added. Using an animation its a lot easier to quickly see how the patterns changes with the number of antennas. Saving the results as a GIF file rather than an animation using Python allows the results to be used in a website or in a Powerpoint presentation. As of Matplotlib 3.3, there is not a gif export capability, but this king of animation can be generated using Python, Matplotlib, and GIMP. 


Discussion

Often scientific information and simulation results need to be visualization to be explained and understood. Live animations in a script are useful, but distributing scripts has a lot of issues. Creating movies (mp4, avi) sometime have issues with encoders. GIF animations are a reliable way to share a lot of information. They can be embedded in PowerPoint, send via email, or embedded in a website with minimal effort and the results just work. 

Once a script to generate a Matplotlib animation, it can usually be modified to generate a sequence of images and save them using the 'savefig' command. Then there are several tools that can convert the images to an animated GIF. With GIMP, the GIF can be generated in a a few minutes and there are a lot of options to control the speed, resolution, and resolution of the result.


Steps to Generate An Animated GIF

  1. Generate a sequence of images of identical size using Matplotlib and save using savefig.
  2. Open GIMP
    • File -> Load the plots as layers
      • Select all image files
    • Filters -> Animation -> Optimize (for GIF)
      • Image->Mode->Index
      • Generate optimum palette
      • Maximum number of colors: 256
      • Color dithering: Positioned
      • Enable dithering of transparency: checked
      • Select Convert
    • Filter -> Animation -> Optimize (GIF)
    • File -> Export
      • Name the file with a .gif extension
      • Select 'Export'
      • On new dialog
        • Select as animation
        • Set timing to 20 msec

    References

    Sample Script


    
    # -*- coding: utf-8 -*-
    
    import matplotlib.pyplot as plt
    from matplotlib.colors import BoundaryNorm
    from matplotlib.ticker import MaxNLocator
    import numpy as np
    
    
    
    # make these smaller to increase the resolution
    dx, dy = 0.005, 0.005
    
    # list of antenna locations
    wavelength = 0.1
    spacing = 0.25
    x_list = np.array([0,1,-1,2,-2,3,-3,4,-4,5,-5,6,-6,7,-7])*spacing*wavelength
    y_list = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
    phase_deg_list = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,] 
    phase_offset_deg_list = range(0,720,9)
    
    num_antennas = [1+int(pp*8/720)*2 for pp in phase_offset_deg_list]
    
    
    
    for idx,(phase_anim,antennas) in enumerate(zip(phase_offset_deg_list,num_antennas)):
        
        # generate 2 2d grids for the x & y bounds
        y, x = np.mgrid[slice(-1, 1 + dy, dy),
                        slice(-1, 1 + dx, dx)]
        
        z_list= []
        for xx,yy,phase_offset in zip(x_list[:antennas],y_list,phase_deg_list):
            phase_shift = 2*np.pi*(phase_offset + phase_anim)/360
            zz = np.real(np.exp(1j*phase_shift)*np.exp(np.pi*2/wavelength*-1j*np.sqrt((x-xx)**2 + (y-yy)**2)))
            z_list += [zz]
        
        
        z = sum(z_list)/len(z_list)
        
        # x and y are bounds, so z should be the value *inside* those bounds.
        # Therefore, remove the last value from the z array.
        z = z[:-1, :-1]
        levels = MaxNLocator(nbins=15).tick_values(z.min(), z.max())
        
        
        # pick the desired colormap, sensible levels, and define a normalization
        # instance which takes data values and translates those into levels.
        cmap = plt.get_cmap('bwr')
        norm = BoundaryNorm(levels, ncolors=cmap.N, clip=True)
        
        fig, (ax1) = plt.subplots(figsize=(12,10),dpi=75)
        
        
        # contours are *point* based plots, so convert our bound into point
        # centers
        ax1.plot(x_list,y_list,'o',linestyle='',markerfacecolor='black')
        ax1.plot(x_list[:antennas],y_list[:antennas],'ko',linestyle='',markerfacecolor='white')
        cf = ax1.contourf(x[:-1, :-1] + dx/2.,
                          y[:-1, :-1] + dy/2., z, levels=levels,
                          cmap=cmap,
                          vmin=-1, vmax=1.0)
        #plt.clim(-1,1)
        fig.colorbar(cf, ax=ax1)
        ax1.set_title('Number of Antennas = %i' % antennas)
        
        
        # adjust spacing between subplots so `ax1` title and `ax0` tick labels
        # don't overlap
        fig.tight_layout()
        plt.savefig('image-%03i.png' % idx)
        
        plt.show()
    
    

    Sunday, February 14, 2021

    Generating 3D Plots and Exporting the Geometry - An Example with Antenna Radiation in Matplotlib, PyVista and Blender

    This was made starting from a Python script using Matplotlib. This post shows how. 

    Discussion

    Matplotlib is a great tool for generating beautiful 2D and 3D images quickly. 


    However, sometimes its desirable to generate something more visual and appealing to help explain a concept more clearly. Often this is not possible in Matplotlib. Other tools must be used.  However, there is not facility in Matplotlib (as of 3.2.1) to export that plot surface for use in other tools. 
    Fortunately, using almost the same syntax as Matplotlib, a 3D plot can be created in PyVista and exported in one of the common 3D file formation like stl, ply, or fbx. Once this is done, its possible to import into a tool like Blender and create a high quality rendering or animation to help explain a concept. 

    This posting will cover the key steps in doing this and use the plotting of the radiation pattern of an antenna to help illustrate.


    Setup

    You'll need Python 3 with Matplotlib, PyVista, and Blender to replicate this. My personal setup uses the portable version of Python 3 and Blender.
    1. Install WinPython portable (https://winpython.github.io/)
    2. Use pypi to install PyVista (https://pypi.org/project/pyvista/)
    3. Install Blender portable (https://www.blender.org/download/ and chose the portable version)

    Example

    Sunday, March 18, 2012

    Installing sailfish-CFD under PythonXY

    image

    Discussion

    Sailfish CFD is an interesting python program. It uses OpenCL to solve fluid flow problems using the Lattice-Boltzmann method. Sailfish uses PyOpenCL or PyCuda to manage the simulations and pygame as one of its visualization packages. The following steps can be used to add Sailfish CFD to PythonXY and execute the examples using OpenCL.

    Installation Steps:

    1. Install the OpenCL driver for the computer from the CPU/GPU vendor. NVIDIA drivers can be found here. Intel Drivers can be found here.
    2. Download the source using Git following the links here. If you do not use Git or don’t want to add it to your machine, a portable version is available here.
    3. Since the developers do not offer packages (yet?), use git to clone their repository. I clone to a temp directory, then work from there.
      • git clone git://gitorious.org/sailfish/sailfish.git c:\temp\sailfish
      • The repository can be viewed through a GUI using the gitk command.
    4. I like to keep the python pieces together, so I copy the contents of the sailfish directory from my temp location to the a sailfish directory created under c:\Python27.
    5. There appears to be an issue with the location of the Mako files. So following the recommendation on the Sailfish google group in this exchange, copy all of the template files from “C:\Python27\sailfish\sailfish\templates” to “C:\Python27\sailfish\sailfish”.
    6. Create a file named “sailfish.pth” at “C:\Python27\Lib\site-packages”. Edit the file and add a single line with “C:\Python27\sailfish\”. This tells python to look in that directory for the sailfish modules.
    7. Test the installation by opening Python and typing “import sailfish” at the prompt. If it imports without errors, your installation is probably good.
    8. Since PythonXY works with PyOpenCL out of the box and Sailfish works with OpenCL, the examples can now be run by using the “—backend=opencl” option. Open up “C:\Python27\sailfish\examples\lbm_cylinder.py” and try to run using the opencl option.
    9. If your system does not have a GPU, you might encounter an error like “pyopencl.LogicError: Context failed: invalid value”. This is because Sailfish is configured to only use GPUs. I was able to get my installation to work on a system without a GPU, but with Intel OpenCL installed. This was done by modifying “C:\Python27\sailfish\sailfish\backend_opencl.py”  modifying line 31 from “devices = platform.get_devices(device_type=cl.device_type.GPU)” to “devices = platform.get_devices()”.
    The image at the top of the posting was generated from “C:\Python27\sailfish\examples\lbm_cylinder.py”.

    References:


    Test Environment:

    • Win 7 Professional
    • PythonXY 2.7.2.1
    • Sailfish git commit with SHA 1 of 29d7c934be8c02cf714ce2307796a4550428b512 from 2011-11-06 at 14:35:56
    • Intel OpenCL
    • i7 CPU, no GPU
    This work is licensed under a Creative Commons Attribution By license.